STRS Investment Beliefs: The Emperor’s New Clothes
Rudy Fichtenbaum
February 13, 2023
I want to speak about the issue of active management and about the whitepaper, “Investment Belief: Active Management” that STRS OH investment staff shared with the Trustees less than a week before our January meeting. The paper is 38 pages long and is based on analysis that was laid out in two Excel spreadsheets. The first has 25 tabs, many of which are linked; that is, the calculations in one tab can depend upon calculations done in another, making it doubly difficult to verify even one result. I am still working my way through parts of the whitepaper and the spreadsheets. Nevertheless, I am prepared today to offer some observations about the whitepaper.
First, let me comment on the issue of the returns used in the analysis. All the returns are gross returns, which in STRS speak means that the returns for all the asset classes are gross of fees; that is, the returns have not had fees incurred by STRS subtracted out. The only exception is for real estate and alternative investments, which are allegedly net of all fees, including carried interest. I use the word allegedly because there is no real way to verify that the real estate and alternative returns are net of fees. STRS claims that CEM verifies its fees, but returns are for a fiscal year and CEM “analyzes STRS fees on a calendar year basis that lags the fiscal year”. In other words, the period of the returns and fees do not match. Further, where does CEM get the expense information? They get it from STRS. Are these expenses audited? No! More importantly, if STRS has the actual data on fees, why doesn’t it present the returns so that they are totally net of fees? Relying on gross returns, which both STRS and Callan do in making presentations to the Board to compare our performance over time and against other pensions, is at best highly suspect.
Two pensions with the same gross returns and same cash flows can end up with different amounts of money with which to pay benefits if the expenses of one pension differ from those of another. All that really matters is how much money is available to pay benefits, and that depends upon expenses. Here’s an analogy. Suppose I go into a store and buy something for $100 – plus tax – and I write a check to pay. When I record the check in my register, do I subtract $100 from the balance in my checking account, or do I subtract $100 plus tax? When I tell someone how much I paid for the item I say I paid $100. But if I enter the price without the tax in my check register and keep up that bad practice, there is a good chance that I will end up with an overdraft – bouncing a check due to insufficient funds. On that basis alone, one could dismiss the analysis in the whitepaper.
Beyond that, however, there are many additional problems with the whitepaper. One of these problems is related to STRS Benchmarks. For the entire period considered, the longest being from FY93 through September of 2022, using monthly data, the benchmark for alternative investments is STRS’s own performance. That means for a significant portion of our portfolio, especially in recent years, there really is no independent benchmark. But it’s even worse than that. The draft report issued by the Auditor stated, “…it is a mathematical certainty that computing the total fund benchmark based on budgeted allocations rather than actual allocations will reduce the alternatives benchmark if two conditions are met.” (By the way, “budgeted allocations” are sometimes called “policy weights”.)
What are those two conditions? To answer, let me first remind you of some jargon we use in discussing STRS’s alternative investments. STRS’s alternative investments are divided into two categories, namely private equity (PE) investments and opportunistic/diversified (OD) investments. The first condition is that PE investments outperform OD investments. The second is that the actual weight of PE investments exceeds the policy weight (that is, the target weight). The draft auditor’s report noted that “Both conditions were met in fiscal year 2021.” Miraculously, this language was removed from the final draft. However, removing the language does not change the reality that the statement in the draft audit was and is still true. Now the staff may claim that they do not calculate returns for PE and OD separately when they calculate the total fund benchmark. That may in fact be true, but it does not change the reality that the alternative return used to calculate the benchmark is a weighted average of the PE and OD returns. This weighted average is not the simple weighted average that all of us learn to take when we learn elementary statistics in school. The weighted average is more complicated mathematically because it must account for the daily weighted cash flows used to calculate Modified Dietz returns; STRS uses these Modified Dietz returns to report monthly returns by asset class and for the total fund. I would be happy to provide the mathematical proof that the alternative return used to calculate the benchmark is a weighted average of the PE and OD returns; I provided that proof to the Auditor. I have also verified empirically that the monthly alternatives return for alternatives is the weighted average of PE and OD using data provided to me by the STRS investment staff.
Another advantage STRS gives itself is using a benchmark for real estate that does not take leverage into account. Even if there are reasons for using that benchmark, it should be adjusted to take into consideration STRS’s use of leverage in its real estate investments to improve its returns.
So, here’s the bottom line with respect to STRS’s benchmark that was used in the analysis done in the whitepaper. It is set up so that STRS outperforms the alternatives benchmark when they use their own return as the benchmark, and gives itself a further advantage by using a benchmark that does not account for leverage. This artificially lowers the total fund benchmark, creating the illusion that active management outperformed passive investing. Let me provide another analogy. Imagine that you enter a foot race. Each competitor steps up to the starting line, and when the gun sounds, she runs until she crosses the finish line, and the judge keeps score by noting the time it takes to get from the starting line to finish line. If you are the STRS racer, however, you get to move the finish line a little closer as you approach the finish line, thereby improving the odds you will win.
The white paper also criticizes my use of an analysis of STRS performance done by Richard Ennis. For those of you unfamiliar with Richard Ennis, he is a chartered financial analyst (CFA) who managed money at Transamerica and pioneered quant investing in the early 1970s. He helped create the field of institutional investment consulting at A.G. Becker & Co. He also co-founded Ennis Knupp, the first investment consulting firm to be recognized as a professional services firm. He received an award from the CFA institute and served as editor of the Financial Analysts Journal. After the Coingate scandal of 2005, the Ohio Bureau of Workers Compensation hired Ennis Knupp to conduct a valuation of its alternative investments to reduce its exposure to a “high risk” portfolio of alternative investments.
The STRS whitepaper criticizes Ennis for using annual data and claims to replicate his analysis using both annual and monthly data. In his analysis, which has been published in the Journal of Portfolio Management, Ennis creates a passive benchmark using so-called style analysis, a technique developed by the Nobel Prize winning economist William E. Sharpe. Using monthly data, STRS claims that it outperforms a passive index constructed using Ennis’s technique. However, the STRS investment staff did the analysis incorrectly. The analysis involves creating a benchmark using weighted average of three publicly available indices that matches the style of investing by a particular pension. Sharpe’s article states that the objective is to minimize the variance of the tracking error; the tracking error is the difference between the pension fund’s return and a weighted average of indices. The weights are determined by using quadratic programming, or by using an algorithm, that searches for a set of parameters – weights applied to the various indices – that minimizes the variance of the tracking error subject to the usual constraints: the weights sum to one, and they are all non-negative (the assumption of no short selling). Sharpe explicitly states in his article, “Note that the objective of such an analysis is not to minimize … the sum of the squared differences.” However, that is exactly what the STRS investment staff did in their analysis. They minimized the sum of squared errors with a single constraint, namely that the parameters sum to one. So technically the analysis done by the investment staff is just wrong.
Beyond that they use gross returns, and Ennis of course uses net returns. According to Ennis, using gross vs. net adds 10-15 bps to STRS’s performance. I was able to replicate Ennis’s analysis using Excel Solver (the same tool used by STRS investment staff), and a colleague of mine also verified the results using Mathematica. Ennis’s analysis shows that STRS loses to a passive benchmark by an average of 1.6% a year over 13 years. What does losing by 1.6% a year to a benchmark mean? Other things being equal, a difference of 1.6% over 13 years on a $50 billion investment amounts to $25.6 billion of lost income.
I want to be clear that this does not mean STRS would necessarily have $25.6 billion more if it had used passive investing, because the value of STRS’s assets depends not only on its investment returns but also on how those returns interact with STRS’s cash outflows. Every month, contributions are $331 million less than expenses, most of which are benefit payments. But using smoothed returns from monthly data along with estimates of cash flows derived by dividing annual cash flow data by 12, which smooths the cash flows, to estimate how much money more money STRS has due to active investing is clearly wrong.
Moreover, STRS cherry-picked the data in its previous analysis, which was confirmed by Auditor’s report. Thus, it is safe to say that if STRS had used index investing, we would be a lot closer to being able to restore a COLA and allow teachers to retire with unreduced benefits after 30 years.
I did my own analysis a few months ago using calendar year data, comparing STRS’s performance to a portfolio that started with a 60-40 stock bond mix and moved to a 70-30 stock bond mix. I shared this analysis with the Board, but it was ignored because the Board was seemingly more focused on how I conveyed the analysis than the information it contained. For the stocks part of the portfolio, I used the S & P 500, and for the bonds, I used a mix of 3-month Treasury bills, 10-year Treasury bills, and Baa Corporate Bonds. I found that STRS lost to the portfolio I constructed by 0.87% a year for 30-years. Moreover, my portfolio had less risk; to be specific, the returns of my portfolio had standard deviation of 10.94%, compared to the standard deviation of 11.30% for STRS’s actual returns.
STRS’s use of monthly data also has other problems. The risk – again, as measured by the standard deviation of returns – embedded in the monthly returns is 20% less than the risk of the annual returns for the same period. STRS’s use of monthly data has the effect of dampening volatility, i.e., smoothing the returns. Smoother returns create the illusion of less risk. So, when solving the quadratic programming problem, Sharpe’s methodology will select a portfolio with more bonds to match the lower volatility artificially created by using monthly data. Since Bonds have a lower return than stocks, the benchmark created in STRS’s analysis makes it easier for STRS to beat the benchmark.
By now, this should sound familiar. Create a benchmark that allows you to win. The results, sent to me by Ennis, show the R-square (the percentage of the total variation explained by the model) obtained by regressing STRS returns on the predicted return using annual data is 99.2%, whereas the R-square using instead monthly data is 97.3%. Also, the annualized tracking error (the standard deviation of the difference in STRS return and the benchmark return) is 1.1% for annual data and 1.6% for monthly data, so the tracking error for the monthly data is 46% higher than for the annual data. Finally, while this is getting even further into the weeds, to make valid statistical inferences as STRS does, one must assume that the distribution of the returns is normal. To test the normality of the distribution of returns we can use a Jarque-Bera test. The null hypothesis is that the data is normally distributed. If we reject the null hypothesis, then we are concluding that the distribution is not normal, and therefore inferences from estimates made using non-normal data are not valid. Using annual data, Ennis found that the null hypothesis was not rejected at the 6% level of significance, so the annual data is normally distributed; but he also found that using monthly data, the null hypothesis was rejected at the 0.001 level of significance, so the monthly data is not normally distributed.
In conclusion, our situation is analogous to the fairy tale by Hans Christian Anderson about an emperor who gets hoodwinked by weavers to make him a suit of invisible – in fact, nonexistent – clothes. Everyone in the town recognizes that he is naked, but as he walks through the town, no one will say anything because they don’t want to feel inept or face his wrath. But finally, a child blurts out that the emperor is naked, but he just continues walking as if nothing is wrong. It is time that the Board stand up and recognize that the current system is broken, and it is time for change.
<< Home