#############################################################

                        NOTE FROM CBSU

This document consists of two parts stacked together. Please
be sure to read the second part - it contains important updates
about the format of the input files.

##############################################################

                        ------------
                        | P A R T 1 |
                        ------------



NOTICE
------

This program may be freely used, copied, redistributed, and modified. Comments,
criticisms etc to m.a.beaumont@reading.ac.uk.


Introduction
-------------

This program is designed to help the user explore the most probable demographic
and genealogical histories consistent with a sample of chromosomes typed at one
or more loci. It relies on Markov Chain Monte Carlo (MCMC) simulation. It is
NOT an automatic "black box" method, and will need other graphical/statistical
programs to interpret the output. It should be used in conjunction with the
paper "Detecting expansion and decline using microsatellites". It should
be possible (I hope!) to replicate the results given in the paper using
the real and simulated data sets described there.

The program is very slow. It is not practicable to run it on a Mac that is
slower than a G3, or  anything slower than a pentium II. In additition, it
creates large files - of the order of 1Mb per locus - so make sure your file
system is not full.

The program differs slightly from that described in the Genetics paper. For the
paper I ran the loci individually and then used some density estimation methods
to combine across loci, as described in the  paper. I think it would be too
much to expect most people to go through that effort, and have therefore
modified the program to run all the loci simultaneously. 

The problem with running all the loci simultaneously is that the acceptance
rate can often be quite low, and it takes a long time to converge. I recommend
that you first start with small sample sizes to obtain some idea how well the
method is going to work for your data set. 

To make it easier to work with subsamples of the data I have included a
sampling program called sinf.exe. Given an infile, formatted as described
below, the program allows the user to obtain samples of the requested size
without replacement. 

Description of the model 
-------------------------

A sample of chromosomes typed  at a microsat locus can be assumed to have a
genealogical history consisting of coalescence events and mutation events going
back in time until the most recent common ancestor of the sample. The assumed
demographic history is of a single stable population that was of size N1
chromosomes at some time ta ago and subsequently changed gradually in size to
N0 chromosomes over the period from ta to the current time. For purposes of
estimation the parameters are scaled in terms of the current population size
and we have two demographic paramters tf = ta/N0 and r = N0/N1. The mutation
rate mu is scaled in terms of current population size to give  theta = 2N0mu.
It is assumed that r and tf are the same across loci and theta varies among
loci. Two models of population change are assumed - a linear change between N1
and N0 over time interval ta, and exponential change. The user has a choice of
which model to use. The program does not allow one to compare the goodness of
fit of each model. 


I hope all terminology and jargon used in this paper is defined in the section
"Bayesian terminology and general notes" below.  The program estimates the
posterior distribution of parameters describing the genealogical and
demographic history of the sample, given some rectangular prior distribution
for the demographic and mutational parameters (theta, r, and tf) and the data.
By a rectangular distribution I mean one in which, within the bounds of the
rectangle (oblong in 3-d) the prior distribution is flat and outside the bounds
it is 0. Within the bounds of the prior the 3-d posterior distribution for
theta, r, and tf is proportional to the likelihood surface. Note however that
the marginal posterior distribution for each parameter will not be proportional
to the marginal likelihood within the limits. This is because the marginal
likelihood for one parameter would be obtained by averaging over the
likelihoods for ALL values of the other parameters, whereas the marginal
posterior distribution is obtained by averaging within the bounds of the
prior. 

Restrictions on the Model
-------------------------

a) It assumes less than 100 loci (you would need a network of Crays!)

b) It assumes that there are no more than 2000 coalescent and mutational events
in the genealogy. (This can be easily changed, but seems reasonable at the
moment - email me if there is a problem).

c) It assumes you have microsatellite data evolving by a single step mutation
model.



Data input - Infile
-------------------

The file "infile" contains the microsat data in the format below. The example
file here is for the 8 polymorphic wombat loci described in the manuscript.
There should be no text, characters or extraneous numbers until after the last
locus. Lines have no significance - it is the sequence of numbers that is
important.

8     - number of loci (it will only read this number - so you can have notes,
        other unused loci etc underneath.
2     - number of alleles (allelic classes)  at locus
28 20 - counts of chromosomes with the same length
0 7   - length corresponding to counts above (has to be relative to 
        something - I have zero here but could be any number; should be in same 
        units - ie use number of repeats)
2     - number of alles at next locus
.
.
.
2     - number of alleles at last locus
10 46
0   4
anything you want down here.

Control of simulations - init_v_file
-------------------------------------

The file "init_v_file" contains information as described below. There should be
no text, characters or extraneous numbers until after the last required number.
Lines have no significance - it is the sequence of numbers that is important.

13485801 - starting displacement of the random number sequences - more details 
           below.
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0  - Starting value of theta for each locus. 
                                    There should be as many starting values 
                                    as there are loci specified in 
                                    "infile".
0.05 1.0 - starting value of r and tf respectively (assumed to be the same 
           for all loci).
0 0 0 - an indicator which shows if theta, r,  tf (respectively) should be kept 
        fixed. 1 indicates that the corresponding parameter should be 
        kept fixed (ie you are conditioning on a fixed value), 0 indicates 
        that it can  vary.
-6 1 - the lower and upper bounds on (respectively) log10(theta), log10(tf),  
-4 0   log10(r). Note that  these are log values but starting values are NOT 
-2 1   logged. 
1     - indicates whether linear change (0) or exponential change (1) is to be
        modelled.
20000 - the maximum number of thinned (see below) update steps. I.e. in this
         example the total number of update steps is 20000 * 10000 = 200000000
10000     - The degree of thinning - ie the interval (in update steps) between 
             successive lines of output. 
0.5    - A scale parameter for the MCMC simulation. It is difficult to work out
a fully automatic choice, so it is left to the user. For many data sets a value
similar to the reciprocal of the square root of the number of loci seem adequate.


Notes. 

If the maximum number of thinned update steps is set to be greater than 2^31
you'll have problems of integer representation. 

For multilocus data sets the number of steps is evenly divided among the loci
so the number of updates per locus will be 200000000/8 in the wombat example
here and the number of updates between successive lines of output will be
10000/8 per locus.

Running the program
-------------------

Double click on the application. There is no Windows GUI so it will create an
msdos window. Typing control c or closing the window will kill the program. No 
indication on the screen is given of how far the program has got (we are 
usually talking hours or days). To see how much it has done you need to look at
the output files while the program is running. I suggest the safest way to do
this in Windows is to copy/paste a file (e.g. out1) to make a copy (e.g. 'copy
of out1') and then look at that.  I suggest running the program on the test
data  set to check that it works and to see what the output looks like. 

The infile given here is for the wombat data analysed in Beaumont (1999). If
the program is run correctly it should be possible to replicate the posterior
distribution of r and tf for the polymorphic loci in figure 6.

Using sinf.exe
--------------

Double click on sinf.exe (or run with dos prompt to get warnings if the
requested sample size is larger than the actual size). It prompts for a sample
size. It then produces two files - one called "sampfile", which contains the
requested sample, and the other called "remainderfile", which contains the
remainder. This can then be renamed "infile" to run with msvar (making sure you
keep a copy of the original infile). 



Random Number Generation - INTFILE.
-----------------------------------

The program uses a rather ancient Generalized Feedback Shift Register (GFSR)
random number generator devised by Lewis and Payne in the 1970s. The advantage
of using this is that it is very fast and has an immense cycle length and has
good statistical properties.  The disadvantage is that it requires a large file
describing the current state of the generator - INTFILE - which is regularly
updated during and at the end of an MCMC run. The other disadvantage is that it
is not possible (meaning I don't know how to do it) to randomly choose a
starting point in the sequence. If all runs were done sequentially this is not
a problem, you would just use the INTFILE from the end of the previous run.
However for runs in parallel you want to start with different INTFILES
(otherwise you'll replicate the exact same MCMC sequence). To overcome this,
the first line in init_v_file, specifies  a number of random numbers to
generate before starting. This number must be less than 100,000,000 (an
arbitrary upper limit). The program will walk through this number of random
numbers before doing the MCMC simulations. This is somewhat unsatisfactory
because it means that parallel MCMC runs will probably share a proportion of
the random number sequence, but given the high degree of contingency ("chaos?")
in the simulations this does not  lead to obvious correlation in the MCMC
outputs between parallel runs (arms waving madly). The program gives a message
that it is starting the MCMC simulation after it has run through the initial
number of random numbers so that one can get an idea of how long it takes to
get through them. I suggest staggering the starting points by quite a large
number in parallel runs just to avoid the chance of the program being sucked
into the same sequence. 


Output files - out1...n

Each locus has it's own output file, labelled 1..n referring to loci in the
order they are given in infile. There are 10 columns:

Column 1 
The line number starting from 0. (This number plus 1) multiplied by the
thinning number is the number of update steps carried out to reach that point. 

Column 2
The number of mutations in the genealogical history of the sample

Column 3
The time of the most recent common ancestor of the sample

Column 4 
Log10(theta)

Column 5
Log10(r)

Column 6
Log10(tf)

Column 7 
Log probability of the genealogical history

Column 8 
The number of mutations in the tips of the genealogy divided by the total
number of mutations in the tree. By "tip" I mean the lineage leading from a
current individual chromosome to the first coalescence involving that lineage. 

Column 9
The proportion of the total branch lengths of the tips of the genealogy
(defined above) divided by the sum of the branch lengths. This should be very
similar to Column 8 when the number of mutations in the genealogy is large
(because the distribution of mutations in the genealogy is proportional to
branch length). This is what I call the Distal Branch Index in the ms. 

Column 10
The proportion of coalescences that are above tf. 

These can be regarded as correlated samples  from the posterior distribution. 
Read the notes below before doing anything with them.


General notes  on Convergence
-----------------------------

The key idea in MCMC is that the stationary distribution of the Markov Chain is
the same as the required posterior distribution. It is important to run the
chain for a sufficiently long time that your sampled points (bearing in mind
that they are strongly correlated) provide a good representation of the
posterior distribution. The simplest informal way to assess this is to run a
number of independent chains with  DIFFERENT starting parameter values and
visually compare marginal posterior distributions for key parameters. Note, the
starting parameters should be DIFFERENT, because if you start them out at the
same point different chains may be similar simply because of the shared
starting point. Another informal way to check for convergence is to look at a
plot of, say, log(theta), log(r), log(tf) against time. If the chain is very
"sticky" and  not "mixing" well (MCMC jargon) it should become apparent: the
movement through parameter space will be very lumpy. Essentially, if it is
mixing well any visual features of the plot should be repeated a very large
number of times so there are no large scale features. By large scale feature I
mean a trend, or an oscillation or flipping between two regimes. In a well
mixing system it will appear very spiky with the general appearance in one
small part of the sequence (say a tenth) looking very similar to another small
part. 

More formal ways of assessing convergence are provided in the CODA package for
Splus and R. The two most commonly used methods are the Raftery-Lewis method,
which uses a single chain, and estimates how long a run you need to estimate
certain quantiles to a required degree of precision, and the Gelman-Rubin
method which uses independent chains and quantifies the variability in the mean
value of a parameter within and between chains (on the basis that at
equilibrium the mean value should be the same between chains). In my experience
the Raftery-Lewis statistic is depressingly conservative for the coalescent
MCMC described here (especially when the possibility of population growth is
allowed). What I mean by this is that it will indicate that individual runs are
too short, even though, if you do independent runs from different starting
positions you end up with fairly similar posterior distributions, as verified
by the Gelman-Rubin statistic. The reason is that it was developed for Gibbs
Samplers (a type of MCMC) where in general the degree of autocorrelation is
low(ish); it is designed to be conservative and becomes more conservative as
the degree of autocorrelaton increases. The degree of autocorrelation in the
coalescent MCMC (particularly with population growth) is many orders of
magnitude times more than assumed by the statistic. Much more useful, in my
experience, is the Gelman-Rubin Statistic. 

However, much though it offends our sensibilities, there is still an 'art' to
assessing convergence and even if you don't have access to the CODA package you
can get a reasonable idea of whether convergence has been attained by looking
at the output of independent runs, and visually inspecting the stickiness of
the chains. But beware: there will always be the possibility that if you ran
the chain for 1000 times longer it will discover a whole new region of
parameter space and your apparently well mixing sequence will turn out to be
poorly mixing on a very large scale. You can never, no matter what statistics
are used, circumvent this possibility.

My advice is roughly as follows: 
Assuming you have multilocus data, choose the
locus with the highest amount of variability and largest sample size (with more
weight on the former). Run it by itself for, say 10^8 updates. Based on the
output of this, ignoring the first 10% of the points (which may be affected by
the starting values and not representative of the equilibrium distribution),
choose starting values of theta, r, tf that are at the edges of (but not too
far from) the observed posterior distribution (you may need to look at the 3
bivariate marginals to get a good ieda). Use these to start off another 4
independent chains. Cut off the first 10% of each chain. Plot histograms of the
marginal distributions of the parameters. Do all 5 coincide reasonably well?
How short should each chain be and still give adequate estimates (it doesn't
matter if the histograms are noisy - its the relative displacement of the
densities that is important)? Once you have decided on this minimum number 
multiply it by  a number that is rather less than the number of loci. Do a run
with all the loci. Look at the output. Choose starting values of the parameters
for another 4 chains based on the posterior distribution (all being well this
should be much tighter than for the 1 locus case). You may decide at this stage
that you can get away with shorter chains. Run it again. Compare the output
from the 5 independent chains. If you are happy that all is well, you could
combine all the output together. Alternatively  (and probably better for
publishing results) you can estimate summary statistics from each chain
individually and then use the variability between chains to give standard
errors on these estimates. 


Specific Notes on Convergence
-----------------------------

The method converges well both for single- and multilocus data when they
support a model of population decline. Convergence is poorer for expanding
populations. If there is strong cross-locus support for expansion, a multilocus
linear model will not converge in a reasonable time. An exponential model will
work better. In the case of the linear model it then becomes necessary to
consider running loci independently, estimating densities separately, and then
multiplying them together as in the genetics manuscript. A tell-tale signal
that the model is not converging well in a growing population is when regions
of parameter space are explored where log(r) changes greatly but log(tf) varies
only a little. 

In general the method will converge more slowly if
a) Sample sizes are large (more than 200 chromosomes per locus not recommended).
b) Samples are very polymorphic  or have a wide difference between minimum and
maximum allele sizes. This implies a lot of mutations. One restriction placed
on the model is that coalescent nodes plus mutations are not > 2000 in the
genealogical history. 

A note on estimating summary statistics from MCMC output
---------------------------------------------------------

Many summary statistics such as the posterior mean, median, quantiles can be
estimated by using standard statistical methods implemented in many packages,
and I will not go into them here. Estimation of the mode and HPD limits
requires density estimation. Many packages will do this for you too. Often
these use kernel density estimation procedures that assume no boundaries and,
since the posterior distributions here are bounded this can lead to problems
(the density will always be strongly underestimated at the edges). In my view
the best density estimation package available is called Locfit (it's free). It
is implemented as a standalone package or (my recommendation) a library for the
statistical packages (interpreters? languages?), Splus and its free lookalike
R. If you want to estimate modes, HPD limits, and make pretty plots, I strongly
recommend using Locfit. 


Bayesian Terminology and general notes.
---------------------------------------

Prior distribution - the probability distribution of the parameters 
independent of any data. This has to come from somewhere (you). 

Likelihood - the probability of the data given a set of parameter values.

Posterior distribution - The probability distribution of the parameters
conditional on the data.  The posterior distribution  is proportional to the
likelihood multiplied by the prior. It is only proportional because we need to
be able to standardize the volume so that it is 1 - ie to give us a probability
distribution.  Integration is often difficult, which is where MCMC comes in
(see below).

Conditional posterior distribution - The (joint) distribution of one (or more)
parameter(s) when the other paramters are fixed at certain values. This is like
taking a slice through the distribution and normalizing it to have volume 1.

Marginal posterior distributions - In many models there are parameters that are
a necessary part of the description of a model but in which we have no
particular interest. Again, the model may have many parameters that we are
interested in, but we may only want to say something about one parameter at a
time rather than all the parameters jointly. However if we fixed the parameters
that we were not immediately interested in to some set of values, we may find
that the posterior distribution for the parameter which we are interested in
(the conditional distribution) changes according to the values we assign to the
other parameters.  Since we don't know the value of these parameters, but have
varying degrees of belief about what they might be (given by the joint
posterior distribution), it is reasonable to estimate the posterior
distribution for the parameter we are interested by summing (integrating), for
each value of this parameter, the frequencies over all possible values of the
other parameters. This is the marginal distribution. Another way of looking at
it is as an average of all the conditional distributions scaled to have volume
1. The shape of the marginal distribution will be more strongly influenced by 
the shape of the conditional distributions in high density regions than low
density regions. 

Improper prior distribution - An example of an improper prior is to assume that
all parameter values (from, say, -infinity to +infinity) have equal prior
frequency (a flat prior). This distribution is improper because it has an
infinite volume. However if the likelihood function goes to 0 at extreme values
of the parameters then an improper prior multiplied by the likelihood function
can give rise to a proper posterior distribution (ie it can be normalized to
have volume 1).

Relationship between posterior distribution and likelihood function -  If a
flat prior is used the posterior distribution will be proportional to the
likelihood function. (It will not be the same because it will be scaled to have
volume 1).

Summary Statistics - In the Bayesian world view all that we can say about the
model (conditional on the data) is given by the posterior distribution. We need
to report summary statistics based on this distribution. In general, point
estimates are not as useful as interval estimates. Examples of the former are
the mode, mean, median of the posterior distributions. An example of the latter
is the equal tail probability interval - ie reporting, say, the 0.025 and 0.975
quantiles of a marginal posterior distribution for some parameter. Another
useful summary (which I like, but is more difficult to estimate) is the HPD
interval. This is like a water level such that, say 95% of the posterior
probability is above it. Many biologists influenced by Edwards "Likelihood"
might like to use an HPD limit such that the density at the limit is log(2)
less than the modal density (maximum likelihood estimate) - so called support
limits. From a Bayesian point of view this is not a very good summary because
you could have a substantial posterior probability outside this limit depending
on the thickness of the tails of the posterior distribution. Graphical methods
of displaying posterior distributions are also very useful - they give good
visual summaries. 

MCMC - Markov Chain Monte Carlo. This is a method of integrating the likelihood
multiplied by the prior. In effect you simulate (autocorrelated) random samples
from the posterior distribution. There are two key aspects to the use of MCMC.
One is this integration. The other is that it automatically gives you marginal
distributions. You just ignore the other values. It's like estimating the
distribution of people's height. You could have measured things like sex, big
toe length, earlobe size etc. The distribution of heights that you estimate is
a marginal distribution over all the other things you could have measured. 

In my view the ability to report these marginal distributions is the key
feature of this program. 

Final Note on msvar.
--------------------

This program should allow you to replicate the results in the manuscript. The
purpose of paper is primarily to analyse the likelihood surface for models of
population growth. Because the likelihood asymptotes for extreme values of all
the parameters it seemed sensible to use rectangular priors. Within these
limits the joint likelihood and conditional likelihoods for the parameters are
proportional to the likelihood. The marginal posterior distribution is not
proportional for reasons discussed above. However, by using rectangular priors
it makes it easier to understand the patterns obtained. The main aim of the
paper is to try to understand the likelihood function, even though we can't
write it down. Using priors that are not rectangular makes it more difficult to
distinguish the effects of the prior from that of the likelihood.

This does not mean, however, that the use of rectangular priors is in general
desirable. For real data sets you do not have clearly defined boundaries on the
parameter values within which all values have equal probability and outside
which they zero probability. Rather, the prior distribution is more reasonably
represented by a smoothly varying function. 

A second criticism of msvar is that we are interested in the natural
parameters, N0, N1, mu, ta, but the likelihood function is only a function of 3
scaled by the fourth (I have used N0 here). Thus given a posterior distribution
for tf we might want to integrate over some prior for N0  to estimate ta. In
addition (especially if the populations have been growing) we might prefer to
have looked at mu.ta rather than theta and then estimate ta by integrating over
some belief about how mu varies among loci. However, from a Bayesian
perspective, it makes sense to abandon these arbitrary parameterizations  and
use the natural parameters directly (Tavare' et al, 1997; Wilson and Balding,
1998). Of course this means that there will be strong correlations between the
parameters (the model is over-determined), but it does have the advantage of
giving estimates for the parameters that we are interested in without further
guesswork. 



Useful references
------------------
Beaumont, M.A. (1999) Detecting Population expansion and decline using 
microsatellites. Genetics, 153, 2013-2029
Gelman, A. et al (1995) Bayesian Data Analysis. Chapman and Hall
Gilks, W.R. et al (1996) Markov Chain Monte Carlo in Practice. Chapman 
                          and Hall.
Tavare' et al (1997) Inferring coalescence times from DNA sequence data.
Genetics 145: 505-518.
Wilson and Balding (1998) Genealogical inference from microsatellite data.
Genetics, 150: 499-510.

** I strongly recommend analysing the output using the R package:-

http://www.ci.tuwien.ac.at/R/contents.html The Comprehensive R archive
http://www.ci.tuwien.ac.at/R/src/contrib/PACKAGES.html where to get CODA for R
http://www.mrc-bsu.cam.ac.uk/bugs/docs/docs.html Documentation for CODA
http://cm.bell-labs.com/cm/ms/departments/sia/project/locfit all you want for
Locfit.
http://www.stats.bris.ac.uk/MCMC MCMC preprint service at Bristol (also has
manuscript versions, in postscript, of already published papers)

                ------------
                | P A R T 2 |
                ------------


HISTORY
-------

Original windows executable compiled with mingw in june 2002.
Recompiled june 2004 with Borland C++. This necessitated a small change in the
code to cope with Borland fp system (to make it ieee compliant). Also a bug in
printerr2 in myutil.c (opening already open files) was corrected.

Rough draft readme updated June 2004.


README

14 June 2002.
Updated 30 June 2004

This needs to be read in conjunction with the readme for msvar0.4.x

The basic structure of the program is the same. The parameterisation is however
now in terms of the effective sizes and times etc rather than the scaled
parameters, theta, r, and tf

Run it in a similar way to msvar0.4.x.

The program now outputs
a new file called hpars.dat, which has the following columns

1 output line number
2 (ignore)
3 Total log likelihood (over all loci)
4 mean current population size (over loci)
5 variance (among loci) of current population size
6 mean ancestral population size
7 variance of ancestral population size
8 mean mutation rate
9 variance in mutation rate
10 mean time (in years) since population started to decline/expand
11 variance of time since population started to decline/expand

It is these parameters that you are going to want to report. The parameters for
the individual loci are in out1..n as before. In these files the demographic
parameters will vary between loci because, this time, the model is assuming that
there is some variation in each parameter among loci (your prior for the amount
of this can be set in the parameter file). In principle, you could detect
selection if one locus is behaving wildly differently from the others for e.g.
growth rate.

So, to summarise, each out1..n file contains:
line number
number of mutations
tmrca
log10(N0) for that locus
log10(N1) for that locus
log10(mu) for that locus
log10(tfa) for that locus
log likelihood
proportion of mutations in terminal lineages
proportion of total tree size in terminal lineages
proportion of coalescences in ancestral pop

NBB don't expect the 'means' and 'variances' in hpars to be exactly the same as
those calculated across out1..outn - the latter are assumed to be
random draws from distributions with means and variances given in hpars.


The init_v_file is changed. It has the following structure
number of 'turns' of the random number generator
ploidy number
generation time
starting values for current size for all loci
starting values for ancestral size for all loci
starting values for mutation rate for all loci
starting values for time since decline/expansion for all loci
indicators (0,1) whether to update values of these parameters.
Starting values for prior mean and variance for current size
..... ancestral size
..... mutation rate
..... time
hyperprior mean and variance for means and variances for:
ancestral size
current size
mutation rate
time since decline/expansion
An indicator as in msvar0.4.x - 0=linear growth; 1=exponential
number of lines of output
number of iterations between lines of output (example gives 10000,
but you will probably want this 10 or 100 times larger to get good convergence
with large numbers of loci or sample sizes).


Further points to note
----------------------

There is an arbitrary upper limit on the events in a genealogy set at 1995
(events = no mutations + sample size -1). Any trial update that makes it bigger
than that is not accepted (think of it as a prior on the number of mutations
permissable). This is done just because it is computationally convenient to set
at the outset the maximum 'size' of the genealogy, and 2000 seems a reasonable
number.

**If this limit is reached when setting up the initial genealogy at the start of
the run, the program stops. Otherwise if it reaches it as part of a trial update
it just doesn't accept the update and the program continues to run. Generally
this latter is a rare occurence - most problems occur at the beginning because
of the setting of the initial parameter values.

**To solve the problem if it occurs right at the beginning, you just need to set
the initial values to something that won't give initial genealogies with too
many mutations (e.g. just keep adding an extra 0 to the mutation rates for each
locus until the problem goes away).