Introduction to random utility discrete choice models

From DDWiki

(Redirected from Discrete choice analysis)
Jump to: navigation, search

Contents

Motivation

As designers, whether focused on satisfying user wants or on making profit for a firm, we are interested in the preferences that people have and the choices that they make. Formal mathematical models of preference and choice structures built upon empirical data help designers make predictions about the appeal of new products or changes to existing products. These models can help inform intuition and assist the designer in understanding and designing for the market, avoiding the tendency for designers to design only with respect to the preferences of the people they know best - themselves. In this class, we will consider one class of consumer choice models built on utility theory.

Utility Theory

Utility is a ubiquitous concept in economics as an abstract measurement of the degree of goal-attainment or want-satisfaction provided by a product or service. We cannot measure directly how much utility a person may gain from a product; however, we can make inferences about utility based on the person’s behavior, if we presume that people act rationally. In computer science, a rational agent is defined as one that acts to attain its goals. Likewise, in economics we assume that a rational person acts to increase her utility.

All else being equal, if a rational consumer is given a choice between product A, with utility uA = 1, and product B, with utility uB = 2, she will choose product B because it provides more utility. In general, given a set of alternatives j = 1,2,...,J, a rational person will choose the alternative that provides the highest utility, so that alternative j is chosen if  u_j>\{u_{j'}\}_{\forall j' \neq j}  . This model does not take into account the degree to which the utility of one product exceeds the utility of another. For instance, if uA = 1 then product B will be chosen if uB > 1, regardless of weather uB = 1.0001 or uB = 1000. In reality, uncertainty in utility estimates would lead one to be more confident in predicting choice B if uB = 1000 and less confident if uB = 1.0001

Random Utiliy Discrete Choice Models

Even if we assume that choices are made rationally, in general we cannot measure utility (predict choices) exactly because, for example, we may not be able to observe or measure every characteristic of the individual, product, or choice situation that affects choice behavior. However, if we can observe some information about the individual, the product, or the choice situation, we can use that information to help predict choice. So, in random utility models we presume that the utility uij provided to individual i by product j is composed of a deterministic component vij, which can be calculated based on observed characteristics, and a stochastic error component εij, which is unobserved, so that


    u_{ij} = v_{ij}+ \varepsilon_{ij} (1)


Later we will discuss how to estimate the observable component of utility vij for individual i and product j using data, but for now we take it as given. Because we never observe the error component εij, we do not have enough information to predict a specific individual’s choice on a specific choice occasion, but, as in regression, we can make predictions about the patterns of choices over many individuals and many choice occasions. The probability Pij of individual i choosing product j from a set of products is


    P_{ij} = Pr \left[ u_j>\{u_{j'}\}_{\forall j' \neq j} \right]  (2)


     =  Pr \left[ v_{ij} + \varepsilon_{ij} >  \{ v_{ij'} + \varepsilon_{ij'}  \}_{\forall j' \neq j} \right]


The Unobservable Component of Utility ε

The ε error terms are unobserved random variables that are described by a probability distribution. In general, this may be a joint distribution of all the error terms, so we use the vector   \boldsymbol{\varepsilon}_{i} = \left[  \begin{array}{ c c c} \varepsilon_{i1} & \varepsilon_{i2} &  \varepsilon_{i3}  \\   \end{array} \right]^T  , which aggregates the error terms for all products, and we describe it’s probability distribution by the cumulative distribution function (CDF) F_{\mathrm{\varepsilon}}(\boldsymbol{\varepsilon}) and its corresponding probability density function (PDF) f_{\mathrm{\varepsilon}}(\boldsymbol{\varepsilon}).

Let us examine a simple case where the choice set is composed of only two products, and we can generalize later. In this case


 P_1 = Pr \left[ v_{1} + \varepsilon_{1} >   v_{2} + \varepsilon_{2'} \right]   (3)


  = Pr \left[ \varepsilon_{2'} < v_{1} -   v_{2} +\varepsilon_{1} \right]


For a given value of ε1, Eq.(3) is Fε(v1-v21) the CDF of the random variable distribution of ε2 evaluated at the point (v1–v21), i.e., the probability that the random variable ε2 is less than the value v1-v21, given ε1. We can integrate across all values of ε1 and calculate the CDF at each point to find P1.


  P_1 =  \int_ {\varepsilon_1=-\infty}^\infty 	\Big(  \int_{\varepsilon_2=-\infty}^{v_{1} -   v_{2} +\varepsilon_{1}} 


f_\varepsilon(\varepsilon_1,\varepsilon_2)\,d\varepsilon_2\ \Big) d\varepsilon_1 (4)


In general, for a set of products


    P_{j}  =  Pr \left[ v_{j} + \varepsilon_{j} >  \{ v_{k} + \varepsilon_{k}  \}_{\forall k \neq j} \right]


  = Pr \left[\{ \varepsilon_{k} < v_{j} -  v_{k} + \varepsilon_{j} \} _{\forall k \neq j}  \right]


   =  \int_ {\varepsilon_j=-\infty}^\infty 	\Big(  \int_{\varepsilon_1=-\infty}^{v_{j} -   v_{1} +\varepsilon_{j}} 

 \int_{\varepsilon_2=-\infty}^{v_{j} -   v_{2} +\varepsilon_{j}}    ...


 \int_{\varepsilon_J=-\infty}^{v_{j} -   v_{J} +\varepsilon_{j}}   f_\varepsilon(\boldsymbol{\varepsilon})\,d \boldsymbol{\tilde{\varepsilon}} \Big) d\varepsilon_j (5)


where  d \boldsymbol{\tilde{\varepsilon}} = d\varepsilon_J...d\varepsilon_{j+1}d\varepsilon_{j-1}...d\varepsilon_{2}d\varepsilon_{1}


The Probit Model

Most commonly in statistics, unobserved random error terms are taken to be normally distributed (e.g., least squares, etc). The central limit theorem provides a theoretical justification for this choice in the absence of other information about distributional forms. If f_\varepsilon(\varepsilon) Eq. (5) is assumed to be a multivariate joint normal distribution with mean vector θ and covariance matrix \wedge, this is called the probit model. The probit model allows for quite a general model; however, it does not yield a closed form solution and requires multidimensional integration.

Some econometricians have alternatively used a restricted form of the probit model where error terms are taken to be independently and identically distributed: i.e., the covariance matrix \wedge is assumed to be diagonal. In this case, Eq.(5) reduces to a single dimensional integral:


  P_j= Pr \left[ \{ \varepsilon_{k} < v_{j} -   v_{k} + \varepsilon_{j} \}_{\forall k \neq j} \right]


   =  \int_ {\varepsilon_j=-\infty}^\infty  f_{\varepsilon_{j}}(\varepsilon_j) \Big( \prod_{k \neq j} \Big( 	 \int_{\varepsilon_k=-\infty}^{v_{j} -   v_{k} +\varepsilon_{j}} f_{\varepsilon_{k}}(\varepsilon_k) d\varepsilon_k \Big) \Big)d\varepsilon_j (6)


   =  \int_ {\varepsilon_j=-\infty}^\infty  f_{\varepsilon_{j}}(\varepsilon_j) \Big( \prod_{k \neq j} F_{\varepsilon_j} (v_{j} -   v_{k}+ \varepsilon_{j} ) \Big)d\varepsilon_j


This simplified form is desirable; however, the assumption of independence of the error terms is a restriction that leads to specific implications, which we will discuss later.

The Logit Model

To simplify matters more, econometricians often use an alternative assumption for the distribution of the error terms: Instead of normal, error terms are assumed to be independently and identically distributed (iid) following the double exponential (Gumbel Type II extreme value) distribution:


  F_{\varepsilon}(\varepsilon_j)=exp(-e^{-\varepsilon_j})


  f_{\varepsilon}(\varepsilon_j)=e^{-\varepsilon_j}.exp(-e^{-\varepsilon_j})   (7)
Normal and double exponential distributions
Normal and double exponential distributions

This assumption yields the logit model. Unlike the normal distribution, there is no theoretical reason to believe that the double exponential is a good assumption for the error terms; however, under this assumption Pij in Eq.(5) reduces to a simple, explicit, usable form, and studies have shown that results obtained under this logit assumption are nearly indistinguishable from those produced by the probit model, except when large amounts of data are available. So, the logit assumption is a useful “engineering approximation”. The standard normal and double exponential PDFs are shown in the figure to the right.

Using Eq.(7) in Eq.(6), we have


 P_j= \int_ {\varepsilon_j=-\infty}^\infty  e^{-\varepsilon_{j}}exp(-e^{-\varepsilon_j}) \prod_{k \neq j}exp(-e^{v_j-v_k+{\varepsilon_j}})d\varepsilon_j

(8)

since vij–vij=0, the exponential term can be brought inside the product, so that the expression is rewritten as


 P_j= \int_ {\varepsilon_j=-\infty}^\infty  e^{-\varepsilon_{j}} \prod_{k}exp(-e^{-(v_j-v_k+{\varepsilon_j})})d\varepsilon_j


 P_j= \int_ {\varepsilon_j=-\infty}^\infty  e^{-\varepsilon_{j}} exp(- \sum_k e^{-(v_j-v_k+{\varepsilon_j})})d\varepsilon_j (9)


 P_j= \int_ {\varepsilon_j=-\infty}^\infty  e^{-\varepsilon_{j}} exp(-e^{\varepsilon_j} \sum_k e^{v_k-v_j} )d\varepsilon_j


We can solve this integral with a change of variables. Let t = exp(-εj). Then dt = -exp(-εj)dεj and dεj= -dt/t. For the integration limits: as εj approaches infinity, t approaches zero, and as εj approaches negative infinity, t approaches infinity. Rewriting Eq.(9) in terms of t:


 P_j= -\int_ {t=\infty}^0  exp(-t\sum_k e^{v_k-v_j}   ) dt


= -( \frac{1} {-\sum_ke^{v_k-vj}}) exp(-t\sum_k e^{v_k-v_j}   )\Big|_{t=\infty}^0(10)


=(\frac{1} {-\sum_ke^{v_k-vj}})-0


=(\frac{1} {-\sum_ke^{v_k}e^{-v_j}})= (\frac{1} {-\ e^{-v_j}\sum_ke^{v_k}})


P_j=\frac{e^{v_j}} {\sum_ke^{v_k}} (11)


The iid double exponential error term assumption has led to a very simple formula for choice probabilities with appropriate properties: choice probabilities range from zero to one and sum to one over all alternatives in the choice set.

Independence of Irrelevant Alternatives

It is important to be aware that assuming independence of the error terms (in both the logit and the restricted probit models) gives rise to a property called independence from irrelevant alternatives, or IIA. We know that if a new alternative product is added to the choice set, some individuals who would otherwise have chosen a product in the initial choice set will instead choose the new product. The IIA property means the ratio of choice probabilities between any two alternatives is unaffected by the presence of a third alternative, and any new alternative introduced to a choice set will take its choice share proportionally from all other alternatives in the choice set. For the logit model, this is easy to show:


\frac{P_A}{P_B} = \frac {\frac{e^{v_A}} {\sum_{k} e^{v_k}} } {\frac{e^{v_B}} {\sum_k e^{v_k}} } = \frac{e^{v_A}} {e^{v_B}}    (12)


The IIA property is also known as the “red bus, blue bus problem” because of a famous illustration of this property: Let’s say commuters have the two options {car, blue bus} available to them and gain equal utility from each (vCAR = vBLUEBUS), therefore choosing each with probability 0.5. If a new product is added to the choice set that is very similar to one of the existing products {car, blue bus, red bus} with equal utility, the IIA property implies that the new product will draw choice proportionally from all other alternatives, so that PCAR = PBLUEBUS = PREDBUS = 0.333. In reality we would expect the red bus to draw far more commuters from the blue bus than from car travel since the two busses are very similar. Choice probabilities will likely be closer to PCAR = 0.5, PBLUEBUS = PREDBUS = 0.25. The IIA also would imply, for instance, that the ratio of votes for Democratic and Republican candidates is unaffected by the presence of a third party candidate. Thus there are limitations to the applicability of models that possess the IIA property; however, a number of extensions exist to mitigate or eliminate this problem, and in many practical applications the IIA property is not problematic. For the remainder of this course we will use the simple logit model; however, interested students are welcome to research more advanced models.

The Observable Component of Utility v

The preceding discussion presumes that the observable component of utility vij is known for each individual i and each product j. We said vij is observable in that it is a function of the observable characteristics of the product, the individual, and the purchase situation. For now, we will limit our discussion so that vj depends only on the characteristics of the product, i.e., all individuals have the same observable component of utility, individual differences are described only by the random error term, and the index i is dropped. The term product attributes is used specifically to describe objective, measurable aspects of the product that are observed by and relevant to the consumer during the choice process. For example, fuel economy of a vehicle may be considered a product attribute, but “sportyness” would not fit the framework unless it can be quantified into discrete levels and assessed identically by all individuals, and transmission ratio is probably not a characteristic since it is generally not observed directly by customers (except for special cases), but rather by engineering designers. The value of the product characteristics of product j are written as the real-valued vector zj, and vj is a function of zj as well as the product’s price pj, which, by convention, is not included in zj.

Just as in regression, we do not know, in general, the functional form relating zj and pj to vj; however, if we have experience with choice models and experience in the problem domain, we may be able to posit reasonable functional relationships that produce good predictions. For example, let's take a simple linear utility function for automobiles that is a function of pj, gas mileage zj1, and performance measured as time to accelerate from 0-60 mph zj2:

vj = β0pj + β1zj1 + β2zj2 (13)

where β0, β1, and β2 are coefficients. If we could observe vj directly, then we could collect data for various values of pj and zj and perform an ordinary regression to find the best fit values for the β coefficients; however, vj is not observed; Only choice is observed. However, in a similar way, we can use past data on choices among vehicles with various values for pj and zj, and it is possible to find values for the β coefficients that result in choice predictions that best match the observed choice data using a technique called maximum likelihood estimation.


Maximum likelihood estimation

In this case, we have 1) assumed the distribution of the error terms (double exponential for logit), and 2) assumed the functional form of vj with respect to observed characteristics. Now we want to find the model parameters (β coefficients) that would lead the model to make predictions that best match observed data (choices). To do this we search for the coefficients that maximize the likelihood that the choice model (with coefficients β) would generate the data we observed: i.e., the model predicts choices probabilistically, and we want to maximize the likelihood that choices predicted by the model would be exactly those observed. On a specific choice occasion with a fixed set of alternatives, the probability of the model selecting the same choice as the one observed for individual i is


  \prod_{j}P_j^{\phi_{ij}} (14)


where φij = 1 if individual i chooses product j, and φij = 0 otherwise. If this process is repeated for many individuals, the total number of individuals choosing product j is given by

nj = φij
i

and the probability of the model generating the observed choices is


 \prod_jP_j^{n_j} (15)


We are searching for the values of β that maximize this quantity. To simplify calculations and avoid numerical difficulties, it is common practice to maximize the log of the Eq.(15), which has the same maximum, rather than maximizing Eq.(15) directly. This is called the log-likelihood, often written LL. The maximum (log) likelihood β terms are therefore:


 \hat{\beta}= \underset{\beta}{arg max}(\sum_j n_j log P_j)  (16)

where P_j=\frac{e^{v_j}} {\sum_ke^{v_k}} for the logit case.


Example

Let’s suppose our choice set consists of four vehicles with prices and characteristics shown below

A B C D
pj($1000s) 15 15 20 20
zj1(mpg) 25 35 25 35
zj2(sec) 6 8 8 6

Suppose we ask 100 people which vehicle each would choose, and we find that 25 choose product A, 30 choose product B, 5 choose product C, and 40 choose product D. Using the logit model in Eq.(11) for choice probabilities Pj and Eq.(13) as the form of the utility function vj we would solve for the β terms as:


 \underset{\beta_0,\beta_1,\beta_2}{maximize}(25logP_A + 30logP_B + 5logP_C + 40logP_D)(17)


where  P_j= \frac{exp(\beta_0 p_j + \beta_1 z_{1j} + \beta_2 z_{2j})} {\sum_{k}exp(\beta_0 p_k + \beta_1 z_{1k} + \beta_2 z_{2k})} (18)

To find the maximum by hand, we can take the gradient of the function, set it equal to zero, solve the resulting system of equations, and check the Hessian for sufficiency. Alternatively, we can use a numerical optimization algorithm, such as Excel Solver, to find the values for the β terms that solve Eq.(17). Using either technique, the solution is β0 = -0.132, β1 = 0.113, β2 = -0.474. We see that β0 is negative, indicating that increasing price will decrease utility (all else being equal), β1 is positive, indicating that increasing fuel economy will increase utility, and β2 is negative, indicating that increasing 0-60 time will decrease utility.

Note that in our example five individuals chose product C, even though it is more expensive, has worse fuel economy, and worse performance. While this goes against the utility trends in a deterministic utility model, random utility discrete choice models, such as the logit model, allow for random error and unobserved attributes that may affect the decisions of individuals while still capturing the overall trends. [Note: if unobserved attributes are correlated with observed attributes in the data, more advanced econometric assessment is required].

Using these newly obtained β estimates, and the corresponding model of choice, we can now make predictions about new products or changes to existing products. [Note: this is only valid if we assume that these changes may affect choices but will not affect underlying consumer preferences for attributes - this may not always be true]. Suppose we wanted to lower the price of product C to attract more buyers. How much would we have to lower the price to double market share (attract 10 out of 100 buyers instead of 5/100)? To make this prediction, we would simply solve


Find pC such that  P_C = \frac{exp(\beta_0 p_C + \beta_1 z_{1C} + \beta_2 z_{2C})} {\sum_{k} exp(\beta_0 p_{k} + \beta_1 z_{1k} + \beta_2 z_{2k})} = 0.10


using the beta values and characteristic values from above. In this case the answer is $14,350. So, vehicle C, with the least desirable characteristics, would have to drop its price below the prices of competitors in order to capture 10% of the market.

More on Functional Forms

In the previous example we assumed a linear functional form for v. In general, how does one know what functional form to use, and what kind of functional form for v should be assumed when there is no prior knowledge about the relationship between v, p, and z?

Just as in regression, we do not know, in general, the functional form relating zj and pj to vj; however, if we have experience with choice models and experience in the problem domain, we may be able to posit reasonable functional relationships that produce good predictions. However, in some cases we may not have good intuition about what functional forms to assume for a particular product and set of product characteristics. One method is to simply try different functional forms and see which one results in the highest likelihood value. However, this can be dangerous in the absence of information about the problem because more general forms (say, assuming a quadratic rather than linear relationship) will always yield a likelihood value at lease as good as more restrictive forms; however, one must be wary of overfitting the data. So, in general, this might be a reasonable technique for testing whether the price relationship is linear or log, it is not a good idea to blindly test arbitrary functional form assumptions and pick the highest likelihood result without assessment of meaning.

Discretization

A somewhat more general technique is to divide the relevant range of each product characteristic in z and price p into discrete levels, estimate the preference coefficients β at those discrete levels, and then interpolate for intermediate values. This allows the model to capture a wide variety of shapes with respect to the real-valued product characteristics z and price p. For example, the graph below shows a hypothetical case where the underlying relationship between v and a single product characteristic z is s-shaped. If we discretize z and obtain preference estimates at the discrete levels (shown as circles), we can interpolate the s-shaped curve. However, if we assume that v is a linear, quadratic, or log function of z, then we obtain a more restrictive estimate that does not capture all of the detail.

Spline interpolation
Spline interpolation

This technique of discretizing and interpolating may not be feasible using data from the market, since we may not be able to describe existing market products in terms of a small number of discrete levels of each characteristic. However, if we are collecting choice data using a choice-based conjoint survey, it is feasible and often desirable.

First, we divide each product characteristic z into discrete levels that span the relevant domain of characteristic values. If the product characteristics are indexed by δ, we divide each characteristic zδ into levels indexed by ω = {1, 2, 3, ..., Ωδ}. For example, characteristic δ=1 is fuel economy, and if fuel economy z1 ranges between say 10 mpg and 40 mpg, we might set levels at 10, 20, 30, and 40 mph, so that ω = {1, 2, 3, 4} refers to {10mpg, 20mpg, 30mpg, 40mpg} respectively and Ω1=4.

Each product in the choice set must be coded with respect to these characteristic levels using dummy variables. Here we notate the dummy variables as δjζω, where δjζω = 1 if product characteristic ζ of product j is at level ω, and δjζω= 0 otherwise. We also include price in this set, with price indexed as ζ=0. Thus, any product j with product characteristics and price at the discrete levels can be coded as a set of 1’s and 0’s in δjζω ∀ζ,ω; Assuming that preferences are linear in the discretized set, we have


vj = βζωδjζω
ζω


where the coefficients βζω are called part-worths because they describe the part of utility derived from attribute ζ being at level ω. There may be cases where linearity of the characteristics cannot be assumed because of interaction effects, i.e., the shape of preferences for one characteristic may depend on the value of another characteristic. However, we leave these as advanced cases that we do not address here.

Using the logit model, the probability of an individual choosing product j is then:


P_j = \frac{exp(v_j)} {\sum_{k}exp(v_{k})} = \frac{exp(\sum_{\zeta}\sum_{\omega} \beta_{\zeta\omega} \delta_{j\zeta\omega})} {\sum_{k}exp(
\sum_\zeta \sum_\omega\beta_{\zeta\omega}\delta_{k\zeta\omega})} (3)


and the log likelihood that a model with part-worth coefficients βζω will reproduce the observed data Φij (where Φij=1 if individual ichooses product j, and Φij=0 otherwise) is


LL = ΦijlnPj
ji


as derived before. Given a set of observed choice data Φij we can find the coefficients βζω that maximize LL.

Example

In the previous vehicle example, we had

A B C D
pj($1000s) 15 15 20 20
zj1(mpg) 25 35 25 35
zj2(sec) 6 8 8 6


We define the discrete levels as

ζ symbol level ω=1 level ω=2
0 p $15000 $20000
1 z1 25 35
2 z2 6 8


The attributes of the four vehicles are thus encoded as:

δjζω j=A j=B j=C j=D
ζ=0 ω=1 1100
ζ=0 ω=2 0011
ζ=1 ω=1 1010
ζ=1 ω=2 0101
ζ=2 ω=1 1001
ζ=2 ω=2 0110


As before, given this choice set suppose that 25 respondents choose product A, 30 choose product B, 5 choose product C, and 40 choose product D. If the log likelihood is maximized using Excel Solver, the resulting βζω part-worths are:

βζω ζ=0 ζ=1 ζ=2
ω=1 0.330 -0.565 0.474
ω=2 -0.330 0.565 -0.474

Notice that in this case, with only two levels per attribute, the results in the discretized model match the results from the linear model. β02 - β01 = -0.660 units of utility for a $5000 increase = -0.132 per $1000. β12 - β11 = 1.13 units of utility for a 10 mpg increase = 0.113 per mpg increase. β22 - β21 = -0.948 units of utility for a 2 second increase = -0.474 per second increase.

Model identification

To be precise, there are infinitely many sets of part worth coefficients βζω that predict identical choice probabilities, and the results shown above are just one such set. This is because our model for v has extra degrees of freedom: i.e., there are more variables than equations in the system of equations. Any of the sets of betas that yield equivalent choice probabilities and log likelihood values are equivalent with respect to the choice model, and any can be used. If we wish to restrict the model to a single, repeatable answer (this is called model identification), we can code the βζω in terms of fewer variables (1 + ∑ζζ-1) variables are needed), or we can add extra constraints to restrict the solution to a particular set of β values from the infinite set of equivalent values for easier interpretation. The solution shown above is the unique β solution maximizing the log likelihood where the average β value of each characteristic ζ across all of its levels ω is zero.

The resulting beta values are plotted below for each characteristic and price. Each ζ is divided into only two levels, so we can use linear interpolation to estimate β values for intermediate levels, for example, a price of $18,000.

By including only two levels per ζ, the resulting interpolation shown in the graphs is linear with respect to the real-valued characteristics, and we have essentially assumed a linear relationship. The final interpolated relationship for intermediate values of p and z, using linear interpolation, is

 \hat{v_j}= -0.132 p_j + 0.113 z_{1j} - 0.474 z_{2j}

In this case, the result is identical to assuming a priori that the utility function was linear. If we had tested more than two discrete levels, we could estimate nonlinear utility functions.


Interpolation

To estimate utility for intermediate values of the price and characteristic generally, it is possible to fit a spline through the part worth values βζω of all levels ω in each ζ to interpolate intermediate values of ζ. It is possible to use many types of splines to interpolate the points; however, to facilitate optimization over the real-valued product characteristic values, it is desirable to interpolate using a spline function that is smooth and continuous over the domain. In particular, we will focus on natural cubic splines: a set of Ωζ-1 cubic polynomials, each of which has a domain between two adjacent levels ω (one between ω=1 and ω=2, another between ω;=2 and ω=3, etc.) that:

1. Match the value βζω at each of the two domain endpoints ω, 2. Match the first and second derivatives of the adjacent cubic polynomial at each domain endpoint 3. Have a second derivative of zero at the extreme bounds of the spline: ω=1 and ω=Ωζ.

An illustration is shown below with Ωζ=4 levels for hypothetical characteristic z:



It is possible to calculate the coefficients of the Ωζ–1 cubic polynomials in a spline for characteristic ζ given βζω by solving a system of equations representing the three conditions; however, we refrain from this detail here. Instead, software packages such as Excel or Matlab can be used to automatically calculate cubic splines given values for βζω. We will notate the cubic spline function for characteristic (or price) ζ that passes through the levels ω of βζω as Ψζ. The interpolated observable component of utility then involves the resulting spline function evaluated at the intermediate, real-valued product characteristics and price:


 \hat{v_j} = \Psi_0(p_j) + \sum_{\zeta>0}\Psi_{\zeta}(z_{\zeta j}) (6)


This interpolated value of v can then be used in the logit model to predict the choice probabilities of new products with intermediate product characteristic and price values.


Example

Suppose that we had included more levels in our earlier example

ζ symbol level ω=1 level ω=2 level ω=3
0 p $15,000 $20,000 $25,000
1 z1 25 35 45
2 z2 6 8 10

and the three separate choice sets below were provided to survey respondents, and their choices were recorded for each choice set.

Choice Set A BC None
1 pj($1000s) 152025-
zj1(mpg) 253545-
zj2(sec) 6106-
2 pj($1000s) 152025-
zj1(mpg) 354525-
zj2(sec) 8610-
3 pj($1000s) 152025-
zj1(mpg) 452535-
zj2(sec) 1086-


Suppose 100 people were given this survey and the number of people choosing each option in each set is given by:

Choice SetABCNoneTotal
1 455455100
2 405505100
3 30253015100

Given these data, the partworths (centered around zero for each characteristic, as before) can be calculated as

ζsymbollevel ω=1level ω=2level ω=3
0p0.64-0.03-0.61
1z1-0.67-0.070.74
2z2 0.740.57-1.32


with the no-choice option utility value of -1.829. Interpolating a spline through the levels of price and each characteristic would enable estimate of the part worth of an intermediate level.

References