1 Introduction

Traditional efforts to model transportation in large city regions operated at an aggregate level, splitting the urban area into a small number of zones and forecasting trips between these zones. The classic four-stage Urban Transportation Modelling System (UTMS) is a common example, including a gravity model to distribute trips between zones.

Aggregate models suffer from limited sensitivity to interesting policy questions [51]. While aggregate approaches can be suitable for projecting a continuation of current trends, they are unable to anticipate the effects of many major policy changes. For example, it would be difficult to model the effects of introducing road pricing or urban growth boundaries, or to project the response to major structural changes in the economics of transportation.

Disaggregate models may prove more suitable for tackling such questions, by modelling the behaviour of individual persons and households. While it is hard to understand the behaviour of a large group of persons with only aggregate statistics about these persons, behaviour is easier to grasp at the level of the individual person or household. Disaggregate models do not aim to predict the behaviour of individuals, but to understand behaviour at that level and use it to make accurate projections at the aggregate level.

Agent-based microsimulation models represent the finest level of disaggregation in current practice. These models forecast the future state of an aggregate system by simulating the behaviour of a number of individual agents over time. In travel demand modelling, the system is usually the spatial arrangement of travel patterns (including the mode of travel used), and the agents are usually persons, families or households. The execution of such a model can be divided into two steps: the creation of an initial set of agents, describing each agent and the system's state at some initial time; and a series of subsequent steps forward, where the state of each agent and the system as a whole is advanced by a timestep (for example, one year per step).

The construction of the initial set of agents is often known as population synthesis, since a “population” of agents must be created. Data is typically not available for the true persons and their attributes at the initial time; hence the initial population is synthetic. A good representation is critical to support a good microsimulation model; “Garbage In, Garbage Out,” is a common phrase in computer science, implying that a good method will still produce bad results if its input is poor.

When analyzing behaviour at the level of individual persons, it is possible to observe and model interesting connections between persons. For example, members of a family do not act entirely independently; they share resources and may choose to travel together in a single vehicle, to adjust their travel patterns to suit each others' schedules, or to make decisions about home ownership based on all family members' needs. However, to represent both individual behaviour and family-level behaviour in an agent-based framework, the relationships between individual persons must be known to form family units.

This thesis focuses on these problems, examining the methods necessary to construct a complete population of persons, families and households for the Integrated Land Use, Transportation and Environment (ILUTE) modelling effort at the University of Toronto. In particular, much of the thesis is concerned with the Iterative Proportional Fitting (IPF) method, a data fusion technique that underlies most population synthesis procedures. While the ILUTE model is the specific context for this thesis, the methods and discussion are relevant to a broader audience. It should be useful to anyone performing agent-based simulation using census data, and may provide new insights to anyone using Iterative Proportional Fitting procedure for data fusion.

The remainder of this thesis is structured as follows. First, a review of the previous work is conducted, covering the ILUTE model, a discussion of the mathematics and notation used for fitting contingency tables, and earlier population synthesis procedures. In the following chapter, the data used for synthesis here is reviewed, including definitions of the agents, attributes, and population universes. Chapter 4 takes a “brainstorming” approach to some of the problems with existing population synthesis procedures, and discusses some potential improvements to established method. This carries directly into the following chapter, which covers the implementation of the ILUTE population synthesizer, including a detailed application of many of the new ideas. Subsequently, the next chapter uses this implementation to conduct a series of experiments to evaluate the new methodological ideas. The final chapter looks at the results of the final synthesis, and summarizes the results of the thesis.