Be skeptical. Be very skeptical.

In recent months, we’ve had a few slip-ups by the official statistical system in India:

  • Yesterday’s IIP release was preceded by a mistake. Mint says: On Monday, the government was guilty of a similar error in its factory output data. Till it corrected the number pertaining to capital goods output, analysts were left scrambling for explanations as to how this had grown 25.5% while overall factory growth had shrunk 5.1%. (The answer: it hadn’t, and had actually shrunk by 25.5%).
  • On 9 December, we discovered there were important mistakes in the exports data.
  • In December 2010, RBI modified the numbers that it releases about its trading on the currency market.
  • In September 2010, there was a mistake in the quarterly GDP data released by CSO.
These examples are part of a larger theme, of problems of the official statistical system. The Indian statistical system is afflicted by three levels of problems:
  1. The first level is conceptual problems and analytical errors. As an example, the weights of the WPI basket are wrong; the estimation methods used in the IIP are likely to be wrong, etc. Quarterly GDP measurement does not have a demand side (which requires a quarterly household survey, which the government does not know how to do).
  2. The second level is the lack of rugged IT systems. The production of statistics requires high quality enterprise IT systems. The government does not have the ability or incentive to roll these out. As an example, the September 2010 mistake in quarterly GDP data seems to have come about because quarterly GDP data is produced in a spreadsheet. As with all usage of spreadsheets, this is highly error prone.
  3. The third level is the problems of truant front-line staff. In a country which is not able to get civil servants to show up at school to teach, it is not surprising that front-line staff of statistical agencies are untrustworthy in going out into the field and filling out survey forms.
The mistakes that we’re seeing are merely a reflection of #2 (the lack of rugged enterprise IT systems). But there is much more going on which holds back the usefulness of official statistics.
Government officials in this field have pinned a lot of hope on the implementation of the report of the statistical commission (headed by C. Rangarajan, 2001). I am personally not optimistic about this. The report seems to emphasise an incremental agenda of building the statistical system, emphasising the interests of the incumbents. What is required is a ground-up rethink about the statistical system, from first principles, so as to address the three difficulties above.
Turning to the users of official statistics, most economists attach enormous prestige to phrases like GDP, IIP, CPI, etc. But in India, we cannot unthinkingly use some numbers just because they come with the label `GDP’ from some government agency. We have to always skeptically ask first principles questions about how the data is generated. All too often, the standard Indian government data is useless.
In the class of government data that I know of, I feel the CPI is reasonably okay. The WPI is a fairly useful database about prices but useless as a price index. The quarterly GDP data, IIP, NSSO, ASI are untrustworthy.
Decision makers in government and in the private sector need to struggle with these issues, carefully thinking about what statistics are allowed to influence their decision processes. Academic users of data need to be much more careful about avoiding garbage-in-garbage-out problems.
For more on this subject, you might like to look at the label `statistical system’ on this blog.

Project Tanzanite: Obtaining fundamental progress in the macroeconomics of developing countries

I was at a meeting in London recently, organised by the IGC, on the subject of the research agenda in macroeconomics for developing countries. This made me think about how to make progress.

The US as the shared dataset for mainstream macroeconomics

All existing knowledge on macroeconomics is rooted in data about the US economy. The US is seen as a canonical developed
country. Economists all over the world have treated it as a common object of study, when building macroeconomics. It is a shared
dataset. Researchers and Ph.D. students routinely pull out a paper from the literature, and replicate the results, as a first stage of
offering innovations: all this is rendered convenient by using the US as a shared dataset. New work is generally obliged to demonstrate value-add in the context of the US dataset.

The US works as a shared dataset because it has high quality data. Good quality data starts right after 1945, because there was no
destruction within the country, hence the early post-war years are not distorted by unusual reconstruction. There was a steady shift away from dirigisme from 1945 onwards, but for the rest there has been no regime change: events like the breakdown of communism or the rise of the European Union or the Euro have not taken place.

In the US, a high quality statistical system has produced good aggregative data. Organisations like NBER have processed this data
nicely to create datasets about the business cycle. High quality datasets are available about households, firms and financial markets. Household- and firm-level data has been nicely utilised to obtain numerical values for parameters in macroeconomic models: why
estimate something using macro data when you know it using gigantic and well trusted micro datasets? Finally, the major question
for macro today is the fusion with finance, and the US has nice data for the financial system.

As a consequence, facts about the US are the shared dataset used in all mainstream macro research across the world.

The insights developed in this literature, which has examined the US economy, have been transported with fair success, into other
developed countries. Thus, this emphasis on the US as a common dataset has delivered good results. As an example, the revolution in monetary policy which was thought through by Friedman, Lucas, etc. was created using US data. It has usefully reshaped central banks worldwide. US data was essential for inventing inflation targeting, but inflation targeting has worked well outside the US.

The major obstacle on building a macroeconomics for developing countries

The major obstacle that interferes with doing macroeconomics in developing countries is data.

India is a good example of what goes wrong. The standard GDP data is in bad shape. The annual GDP data is deplorable, and the quarterly GDP data that is so essential for doing macroeconomics is worse. The IIP is untrustworthy. Put these together, and we don’t have an output series, really.

The BOP data is measured fairly well. Some plausible inflation data is now starting to come together. The statistical system run by the government does not produce seasonally adjusted data [succor]. Given the absence of the Bond-Currency-Derivatives Nexus, the bulk of data about interest rates that is required is missing; policy makers are flying blind. The standard household survey (NSSO) is in bad shape: it does not produce panel data, surveys are only conducted once in a few years, and there are incentive issues about the front-line staff who interact with households.

The large firms are observed using the CMIE database; the small firms are not observed using the ASI dataset. The CMIE household
survey is starting to generate knowledge about households, but this only got started a few years ago. While the CMIE datasets (on firms and households) can be aggregated up to create many interesting macro series, so far this process has only begun in a small way.

Faced with these problems, it is not surprising that little is known, at present, about macroeconomics in India. We know numerous
important questions, and we know that we don’t know the answers. The roadmap to progress is often, though not always, blockaded by data constraints.

Many such problems bedevil the statistical system in other developing countries also.

Economists have complained about bad data in developing countries for decades, and that hasn’t changed things. And there is a uniquely perverse problem. Incremental progress with a gradually improving statistical system does not get the job done for us: By the time a country gets to good institutions and thus a good statistical system (e.g. Taiwan, South Korea, Israel, Chile), the country is not a developing country anymore and is thus not a useful dataset for studying the macroeconomics of developing countries. Chile has world class databases on households and firms, but you can’t extract microeconomic facts using these datasets and use them in
calibration if your object of inquiry is the canonical developing country.

A proposal

How can we make progress? I feel the first idea that we need to agree on is that we do not need many developing countries to build a
great literature. We need a shared dataset, a lingua franca, a replication platform, using which we will build a literature. We need
a country that will play the role, for the macroeconomics of developing countries, that has been played by the United States in
conventional macroeconomics.

The second idea is that we should be a little more ambitious. We should not merely sit around hand-wringing, complaining about a
problem that isn’t going to solve itself. When scientists in other disciplines identify questions that call for evidence, they write
funding proposals (sometimes running to billions of dollars) and organise themselves to create those datasets. Could we do similarly?

Specifically, imagine that we pick one canonical developing country. It’s got to be a typical developing country in most respects. And, it should not be a conflict zone, it should have the basics of law and order and physical safety so that operations can be mounted in it. Christopher Adam of Oxford suggests that Tanzania is a good choice.

Imagine that, the system of interest (a developing country) keeps running, but it gets instrumented up to world class. In essence, we
try to place first world instrumentation into a third world country. (To the extent that this data improves decision making in the
country, we would suffer from `Heisenberg’ effects).

This will call for financial resources and, more importantly, organisational capability. The physicists know how to organise themselves to build the Large Hadron Collider. Most of the time, economists do not organise themselves as laboratories or teams doing complex projects. This will be a bridge that we will have to cross.

As with the Large Hadron Collider, this is not a short-term project. It is a project that needs to run for 25 years, in order to
generate a strong dataset.

At first, the project will generate useful facts for calibration, drawing on household survey and firm databases. Gradually, as the span
of the time-series builds up, the full picture will start becoming clear.

If this works, it can ignite a literature where researchers from all across the world do replicable work off a common dataset. Perhaps
Tanzania could then play a role, for the macroeconomics of developing countries, that is comparable with the role played by the United States in mainstream macroeconomics.

How Will We Know When The Bottom’s In Place?

Okay, so the U.S. is solidly in recessionary territory. The fundamental economic data are lousy, trends are down, consumers and businesses are retrenching, and nobody is happy. We know that, if current forecasts are accurate, the fourth quarter of 2008 will be the worst in terms of economic performance and at least the two following quarters aren’t going to be all that pretty, either.

What we now want to know is, how will we know when the worst is over? What signposts will show when the bottom has been reached and the light ahead isn’t the proverbial catastrophic train wreck heading right at us?

Watch the stock market indices such as the Dow Jones Industrial Average and the S&P 500. Market movements can influence the economy, but being forward-looking, stocks tend to move first and therefore can serve as a leading indicator (explained below) for what people call “the real economy.” An examination of market movements during previous recessions shows that stocks tend to start their rally about four to six months before a peak forms in continuing claims for unemployment benefits.

For example, in the recession of 1981–82, these claims peaked at 4.713 million in November 1982; however, the Dow Jones bottomed out in August and rose over 300 points in those interceding four months. This pattern repeated in 1990–91 and 2001, as well. So when the DJIA quits jittering between 8000 and 9000 and actually begins rising on a sustained uptrend, it will be a good indication a bottom is forming in the labor market and therefore the real economy, too.

Watch the leading indicators. Most economic announcements concern lagging indicators, which trail the data they illustrate by a matter of weeks or months. While the time is necessary to allow for tabulations and calculations, reading about last month’s industrial production figures (down 0.6% in November) is old news at best.

On the other hand, some economic indicators are designed to show, not where the economy has been, but where it seems to be going. These leading indicators are often surveys of consumers or business managers, most of whom know their budgets to a hair, and the readings published therefore reflect their financial expectations for the coming months. The best leading indicators are consumer confidence surveys, such as those published by the Conference Board and ABC News, and commercial indicators such as business confidence surveys, purchasing managers indices, and industrial new orders data.

The two very best are the U.S. Leading Economic Index (LEI) of the Conference Board, and the Purchasing Managers Index (PMI) of the Institute for Supply Management. The LEI pulls from forward-looking data such as building permits, interest rates, manufacturers’ new orders, stock prices, and initial claims for unemployment benefits, and calculates them into a single headline figure for easy comparisons. It’s currently at a level not seen since 1991 and has fallen 3.7% from this time last year.

The PMI unfortunately doesn’t make for more cheerful reading. This survey of manufacturing purchasing managers looks at inputs such as commodities prices, new orders, order backlogs, employment plans, and customer inventories, and calculates a headline figure as well. A reading of 50 indicates the U.S. economy is stable, while readings above that point to expansion and readings below point to contraction. The current November manufacturing PMI stands at 36.2, the worst it’s been since May 1982, with significant gains required to indicate an economic bottom is in place.

Finally, watch payroll data, not unemployment figures, which can be skewed by seasonal factors and the workforce participation rate. Employment figures, on the other hand, point to jobs created and people back at work, and a rise there is generally followed fairly quickly by a similar rise in retail sales.

Like all predictions, this one carries certain caveats, the biggest being that another shock to the global financial network would be much harder to absorb at this stage of the economic game. But indicators don’t lie, and sooner or later that light ahead really will be the end of the tunnel. Watch for it.