原文 https://www.fangraphs.com/tht/rethinking-the-win-curve/
前言:這是去年的Saberseminar的presentation,我會在文內加一些自己的東西。
同時這篇是一個很好的範例,可以讓大家思考一下為什麼
「實際上根本沒有數據派」這回事。
Transaction analysis has become one of the most important topics in all of
baseball research. The primary tool that is used in almost all public
transaction analysis is a dollars/WAR calculation. This tool has been
convenient in many applications. It is especially good for predicting how
much money an upcoming free agent will receive on the open market.
交易分析在這個時代變成棒球研究的顯學之一,而最常為人所用的交易分析工具則是
每一WAR值得多少錢的算法,我們可以見到很多大大都用這個方便的工具在計算與預測
FA到底在市場上該拿多少錢!
However, dollars/WAR also has significant limitations. In many situations,
it simply does answer the questions we are interested in. As an example,
let's look at the Chris Sale trade to the Chicago White Sox. Last offseason,
the White Sox traded Sale to the Red Sox in exchange for Yoan Moncada,
Michael Kopech, Luis Alexander Basabe and Victor Diaz.
但是每一WAR值多少錢這個算法有很多明顯限制,在我的看法來說,
他忽略每一個WAR的邊際效益,也就是WAR並不是個個等值,
就跟意見並不是人人等值一樣
有的意見就是沒屁用or沒建設性的意見,有的WAR就是根本沒影響力 的WAR
所以忽略市場供需與球隊偏好的算法,是不能作為準確估計薪水的方式!
這個算法也沒有考慮volatility,所以不在風險中立機率測度下所估算的薪水並不是
真正的預期薪水,也就是這個算法沒有綜合老化、受傷風險、表現起伏、
以及WAR的時間價值等等,
也許0.5WAR/Y的衰退是一個經驗上的估計,但實在是省略太多應該估計的內容,
也就是說每年預期衰退0.5WAR的算法乘上9M/WAR是一個相對不準確的公式。
By a dollars/WAR calculation, this was pretty close to an even trade,
seeming to be mutually beneficial. The Red Sox preferred the present wins
in the form of a starting pitcher, and the White Sox preferred the prospects.
A dollars/WAR calculation does not allow us to see how much each team
benefitted from this deal. I will present a new framework that can help
teams make decisions under uncertainty. We will be able to see how much
any player is worth to any team given a team's preferences and the
player's impact on the team's projections.
這個作者提出一些想法,並且使用去年的Sale交易案作為例子,
如果用每一WAR多少錢的算法,這看起來就像是一個相當公平的交易。
但是雙方看來都會受益:
因為紅襪想要頂級先發,南風城則是要重建所以需要好的菜逼八。
但是如果用每一WAR多少錢就不能知道這兩隊到底從這交易受益多少。
所以作者在這裡建構一個新的方法,透過考慮不確定性與偏好來進行決策,
考慮不確定性之後可以看到在定下一個球隊的偏好後球員的價值,
以及可以找出球員在球隊預期表現上的影響。
About 10 years ago, Vince Gennaro, Nate Silver and others began writing
about the win curve. The win curve graphs the marginal value of an
additional win given a team's final record. Here is the win curve Silver
came up with.
大概是十年前開始,有不少人開始研究win curve,win curve可以描繪出
隊伍賽季總勝場之下,每多一勝會有多少價值,以下是Nate Silver做的圖,
https://i.imgur.com/inZ63r8.gif
As you can see, the value of a win peaks around win number 90. These wins
greatly increase a team's chances of reaching the playoffs, advancing in
the playoffs, and winning the World Series–many of the things teams really
care about. Focusing on a team's regular-season win total can serve as an
effective proxy for these goals.
你們可以看到,每多一勝的邊際價值在90勝左右達到最高峰,
而也就是在隊伍有機會進季後賽的勝場之後,每一勝的價值就會開始提升,
畢竟季後賽跟世界大賽冠軍才是球隊在意的事情!
I will distinguish Gennaro's win curve by calling it a roster value curve.
Gennaro graphed the total value of a roster over the number of wins it
produced. The win curve is simply a graph of the derivative of the roster
value curve. The key finding Silver and Gennaro found was that, for the
most part, not all wins are created equal. Teams in close contention for
the playoffs see much more marginal value from additional wins.
Gennaro做的圖則是稱為roster value curve,他畫出總價值對應球隊勝場數的圖,
而win curve則顯而易見的就是roster value curve的微分,兩位的發現都是,
大多情況每一勝都不能創造同等的價值,一些更靠近季後賽競爭的隊伍,
每一勝的邊際價值就更高,如同最近這幾天的天使,處在季後賽邊緣的球隊,
除了延長Upton的合約外,也成功地簽下了哦他膩!
https://i.imgur.com/XX71dkM.png
哦齁!看看這張圖,我勇果然可以很有效果的運用每一分錢,每增加一勝所需要的錢
其實都沒有相去太遠,相對於豪門基基而言,不但有效率而且要達到季後賽的花費
也比較低呢!
There has been a lot of debate about what the win curve actually looks
like. Many have argued that Silver significantly undervalued wins that did
not affect playoff odds. For this analysis, it is important to remember
the win curves are for the teams to decide. The curve simply shows how
much a team values each potential outcome.
有很多人也在爭論win curve實際上應該長怎樣,有的人說Silver明顯低估不影響
季後賽機率的那些勝場,但是要切記,這個方法是拿來評估「一支球隊」,
然後透過這個曲線呈現出來「一支球隊」的「一場比賽結果的價值」是多少!
Silver and Gennaro focused on how much these win totals affected a season's
total revenue. However, other factors could shape a team's preferences,
such as how much an owner wants to win a World Series or how much one year's
win total impacts future revenues. The preferences of any team can only be
decided by the team itself. I will use a roster value curve as a team's
unique utility function.
Silver跟Gennaro都關注多少總勝場可以影響一個賽季的總收益,但有其他因素
可以影響球隊的偏好,如老闆有多想要世界大賽冠軍(Hal 簽下Ellsbury的偏好),
或者是一個球隊單季的好戰績可以影響到未來收益(就跟隊板跟風迷增加相似),
這些決策都是球隊用他們各自的效用函數(utility function)而決定的,
而這篇文章則是以roster value curve作為球隊的效用函數。
While a team could identify its preferences and build its own win curve,
the curve has limited ability in helping to guide its decisions. Obviously,
when teams have to make roster decisions, they are operating under
uncertainty. They do not know where they will land on the win curve or
exactly how their potential acquisitions will perform. Phil Birnbaum
wisely pointed this out when commenting on Silver's win curve:
雖然每支隊伍都可以透過自己的偏好性來調整並且建構自己的win curve,但是
這在協助決策上還是有相當程度的限制,例如他們要決定roster名單的時候,
必須考慮不確定性,而他們也不會知道自己實際上應該是從幾勝開始看win curve,
也不知道如果做了交易會怎麼影響他們的表現,Phil Birnbaum就很聰明的嘴了
Silver的win curve一波:
"Silver's graph tells us how much an *actual* win is worth. But, before
the season starts, a team can't know how many wins it will achieve with
that kind of precision. Even if it's perfectly omniscient about how much
*talent* its team has, there's still a standard deviation of about six
wins between talent and achievement. A team that's created to be perfectly
average in every respect should go 81-81–but, just by random chance,
it will win fewer than 75 games about one time in six, and it'll win
more than 87 games one time in six."
「蕭華的圖告訴我們一場『真正』的勝利值多少,但是在賽季開始前,根本沒有隊伍
會知道在某些決策下我們到底可以拿到多少勝,就算我們已經完美通曉這支球隊多有
『天份』,但實際上天份兌現的結果還是會有大約六勝的標準差,而這是一個很大
的差異,假設一支球隊有五成勝率,我們有68%的信心水準勝場會落在75-87勝之間,
而這就已經是靠進季後賽,與低於五成勝率的差別,我們甚至有95%的信心水準勝場
會落在69-93勝之間,而這就已經是爛隊跟季後賽球隊甚至分區龍頭的差別了。」
Birnbaum correctly concludes that this uncertainty will make the hump in
the graph wider and shorter. However, we can be a lot more precise and end
up with a much more useful result. We should leave the win curve as it is
and call it an ex post win curve. The ex post win curve will simply show
the value of each marginal win at the end of the season. From this we can
model a preseason, ex ante win curve, which will show the value of a
marginal projected win. This stochastic model will allow us to find
the marginal value of adding a given player to a specific team.
Birnbaum嘴的相當正確,不確定性會影響標準差,而標準差會影響到win curves,透過
峰形來做初步判定,越寬則標準差越大越窄則越小,而通常不確定性會增大標準差,
所以這裡我猜win curves應該會動,標準差不會只有六場。作者提出兩種新命名曲線,
ex post win curve 跟 ex ante win curve,前者就是原先的win curve,
可以在「球季結束」後呈現每多贏一場比賽的價值,而後者則是可以在「季前」做預測,
使用的是預測勝場數,所以我們不是用球季結束的「觀察值(observations)」而是用
球季開始前的陣容「預測值(predictions)」,並且這樣就可以透過加入球員
來估計所帶來的影響,以及每一場勝利的邊際價值。
To start, I will use a hypothetical ex post roster value curve.
And here is its corresponding ex post win curve. I chose to use a curve
that looked a lot like Silver's.
這裡先用假設的ex post roster value curve做開場,而另一張圖則是相對應的
ex post win curv,這裡挑了一個跟蕭華的圖很像的曲線
https://i.imgur.com/47SmBiX.jpg
https://i.imgur.com/ZExBzHV.jpg
Next, I simply used probability mass functions to come up with
distributions of potential records given preseason forecasts. I created
normal distributions of win total projections with means between 60 and
100, all with a standard deviation of eight wins. The uncertainty in
these forecasts comes from three main sources: random variation, injuries,
and uncertainty of players’ true talent. Here is the probability mass
function for a team projected to win 81 games.
接下來我用了常態隨機變數的機率質量函數(probability mass function, pmf,這裡
是因為勝場數實質上不算小數點,所以是離散機率分配)來做為賽季前預測戰績使用,
此處使用的平均數是預測平均勝場為60-100之間的所有勝場,標準差為8勝,
這個標準差是透過各種不確定性得來,包含受傷、天份的不確定性、以及隨機擾動等。
此處用81勝作為平均數與標準差8勝做一張常態分配圖。
https://i.imgur.com/YfL6gAF.jpg
Using these distributions and the roster value curve, I found the expected
values of the rosters projected to win between 60 and 100 games. The value
of any projected record (or any asset in general ) is the sum of the
probabilities of ending up in every potential state multiplied by the value
of ending up in these states–our discounted expected payoff.
使用這些分配們,以及roster value curve,找出roster預期勝場在60-100勝之間的
資料。預期戰績(或是廣義來說的任何資產)的價值就是把每一個可能的狀態機率,
乘上這些狀態最後的價值,也就是折現後的預期報酬。
其實這裡偷偷藏一件蠻重要的事情:「勝場也是資產」,而且還引入折現discount,
這個假設我猜在後面應該必須用到,才能利用asset pricing來計算。
These expected values were easy to calculate because there are a discrete
number of outcomes for any season; a team can win between zero and 162 games.
Here is the formula I used.
這個就是離散機率分配的期望值算法
https://i.imgur.com/Yu2vq9e.jpg
x = 預期最後總勝場數, w = 實際最後總勝場數(0-162),
p(w|x) = 在給定預期總勝場數下,實際上總勝場數為多少的條件機率
z_w = 給定總勝場數可帶來的報酬
If the summation notation is unclear, here is a quick example:
E(Projected Win Total) = … + p(65 wins)*payoff(65 wins) +
p(66wins)*payoff(66 wins) +
p(67 wins)*payoff(67 wins) + …
這一段太囉唆,總之就是把每個情境的機率跟報酬相乘加起來。
Once we have the expected values of the projections, we can plot a preseason,
ex ante win curve. This win curve will show us the value of rosters given
their projected win total.
From here, we can easily build a new ex ante win curve.
有以上預期勝場算出來的期望值,就能畫出ex ante win curve,也就是roster在給定
預期總勝場的價值應該是怎樣的曲線,有這個也可以簡單地做出新的ex ante win curve。
https://i.imgur.com/5TaIF4g.jpg
https://i.imgur.com/hbaBIb5.jpg
There is a lot to observe in these new ex ante curves. First, we can see the
win curve is much flatter and has a wider hump, just as Birnbaum predicted.
Any increase in uncertainty will continue to flatten the win curve. While
the marginal wins on the hump of the win curve (between 85 and 95 wins) are
most valuable, you do not know where you will end up on the win curve before
the season. By improving your projection from 82 to 83 wins, you may end up
getting yourself some of those most valuable wins.
我們可以從新的ex ante curves看出許多東西:
1. 這個win curve比起蕭華的版本更平,峰更寬,如同Birnbaum嘴的一樣,
只要增加不確定性就會使得win curve變得更平。
最有價值的勝場是落在85-95勝之間的那些勝場,你並不知道賽季結束後
會落在win curve的哪個位置,如果今天是打算從82勝進步到83勝,
到頭來還是得透過roster調整盡可能的拿下那些最有價值的勝場。
Furthermore, we can see this team should never pay more than about $6.5
million to add a projected win to its preseason forecast. This means it
should not pay the going market rate for most free agents despite the fact
that this team has a $210 million payoff from winning 95 games. It is not
sensible for many teams to spend significant money in free agency, especially
when they are not in a high-leverage spot on the win curve. Empirically, we
see teams generally recognize this. The most valuable wins are
worth over $10 million to this team. However, it can’t go buy these wins
with certainty.
2. 此外,我們可以看到球隊不應該付出超過6.5M來增加額外的一場預期勝場,
(ex ante win curve的最高點不超過6.5M)這也就表示預期會有高勝場的球隊
不應該在市場上花大把鈔票尋求自由球員,其實也蠻多隊伍已經知道這件事,
而且有的時候最有價值的勝場還值超過10M!但我們仍舊無法確定多10M價值就能
真的多了一勝出來。
We can model the trade deadline by decreasing the amount of uncertainty in
the win curve. At the deadline, teams already have played over half of the
season. Therefore, they are much more certain of the value of the wins they
are acquiring. If we lower the standard deviation of the wins, we can build
a trade deadline win curve.
這裡可以玩在交易大限前的win curve model,因為在交易大限前可以減少win curve
的不確定性,在Deadline的時候已經打了大半個賽季,有很多不確定性已經發生或經過,
所以追求每一勝的價值更清楚確定,這裡透過降低勝場標準差建構交易大限的win
curve.
You can see this win curve clearly has a much larger peak than the ex ante
win curve. A team in contention may be willing to give up much more for a
projected win at the trade deadline than it would in the offseason.
(The win curve becomes a worse proxy for the outcomes that a team cares about
at the trade deadline, but this is a topic that requires a separate post.)
Adding a projected win at the trade deadline has a much higher probability
of adding the actual wins you are hoping for.
可以清楚地看到這次做出來的ex ante win curve有更高的峰,在競爭中的球隊會願意
在交易大限時付出比起在休賽季時更的多成本去取得額外勝場,
而且在交易大限時取得的預期勝場能轉變為真實勝場的機率其實比起賽季出來的高,
就如同上一段所說,因為賽季打了大半,已經有些uncertainty發生過了。
但其實這個例子有點不好,如果有的球隊他媽完全不在意交易大限,那偏好性就不同,
而win curves就不能這樣畫,但此處先不考慮這個case。
The final issue that needs to be tackled is making this single-period model
into a multi-year model. Luckily, this shouldn't be too difficult. We can
still use the same shaped win curves for every year in the future; they just
need to be discounted.
最後一個是把這個模型拓展到跨年度上,而這理論上並不困難,只要把未來每年所使用
的那些相同的win curves折現過就可以了。
There are three factors to consider when discounting these future wins:
baseball's continuing salary inflation, the interest rate, and impatience.
Baseball has seen consistent salary growth now for decades. We will label
the inflation rate as π, and the interest rate as r. The factor for
impatience will be β, where 0<β<1. We have seen many teams, most notably
Mike Illitch's Detroit Tigers, operate with very significant impatience.
Depending on this unique preference, it can be very rational to
sacrifice the future for an extra win now.
在考慮未來勝場折現的時候要考慮幾個額外因素:薪資通膨 π,聯邦政府利率 r ,
還有老闆的不耐煩指數 β, 0<β<1。 文內說過去的老虎隊很顯然就是沒耐心,
但我個人覺得比較近代的例子就是羅莉亞的馬林魚,跟他的房地產騙局(?
We can adjust the value of wins in future years by a factor of
[(1+π)β/(1+r)]^t, where t is the number of years we are in the future.
This equation is simply a scalar to adjust the win curve up or down. The
interest rate and β decrease the value of a future win, while (1+π)
increases how much we value a future win. In the current year, where t=0,
this scalar will just go to 1.
此處使用[(1+π)β/(1+r)]^t 來作為未來t年的勝場價值調整參數,
其中不耐煩指數β跟聯邦利率會減損未來勝場價值,而通膨則會提升未來勝場價值,
當t=0的時候,就是未來0年,這個參數為1,也就是本年度的勝場價值。
Finally, we can now discuss using the model for transaction analysis.
We need to have a team's win curve and its projections with and without
a specific player. Given those two things, we can precisely calculate how
much that player is worth to a team. The fundamental concept of asset
pricing theory is that price equals expected discounted payoff. We can now
calculate the expected discounted payoff of any player for a given team.
把這個模型拿來做交易分析,我們需要一支球隊的win curve,跟
「有/沒有某個球員的兩種預期勝場」來評估,因此我們就可以精確算出一個球員對
球隊的價值,這裡就真的用上了資產定價理論,也就是價格要跟預期折現報酬相同,
這裡先來計算任何一個球員在某一隊上產生的預期折現報酬。
For each year the player is under contract, we take the expected value of
the roster with the player and subtract both the expected value of the
roster without the player and the player’s salary. This will give us a
net present value evaluation of any player. In simplified mathematical
terms, it’s merely:
https://i.imgur.com/cADrtrr.jpg
當每個球員都身負合約時,可以用
(有這個球員的預期勝場報酬-沒這個球員的預期勝場報酬-此球員薪水)*調整參數
來決定此球員在第t年的淨值為多少。
Going back to our original example of the Chris Sale trade, we can make
no declarations from here on how much each team benefitted from the deal.
But if we had a win curve for both teams and their projections with and
without the players, we could easily find the unique dollar value of each
player in the deal to each of the two teams.
數學式子我想大家看夠多了,回來看一下文章開頭所說那個Sale的交易,
我們不能在這裡宣稱兩隊到底在這筆交易裡面獲得了多少好處,
因為我們沒有這兩隊的win curve跟他們的偏好以及預期勝場。
但如果有這些東西我們就可以輕易的算出在這筆交易中每個球員在兩邊隊伍的價值了
(...那你舉這個例子有屁用= =)
This model provides a simple framework to evaluate the payoff of every
potential transaction, though it does come with a few limitations.
The biggest issue is that it assumes a player will remain with the team
throughout his entire contract and for no longer. However, these sorts
of minor concerns can be accounted for manually. The question of how much
a player is worth to a given team no longer has to be a guessing game.
總而言之,這個模型提供一個簡單的算法去評估每一筆可能交易的價值,
而非僅僅透過簡單的WAR加減計算,而且這個算法的限制已經相對較少,
雖然這裡最大的問題是假設了一個球員會完整走完他的合約,不會opt out等等,
但這些細節可以透過手動調整來解決,而使得這個方法讓球員價值
不再是一場不知道答案的賭博!
後記:因為剛好是自己熟悉的兩個領域,所以一邊讀一邊翻覺得很順暢,
如果有看不懂的地方請盡量提出!
但我認為有點缺陷的地方在於,偏好是屬於動態的,例如最近的史棒棒交易,
是因為他有霸王條款、展現偏好,才得以讓洋基隊改變決策,
這樣的動態過程他並沒有展示要怎麼調整,也許需要靠賽局/IO來解決。
另外一個缺陷在於他沒有精確考慮前面的看法:aging跟傷病風險
如果能用其他統計方式如存活來算出hazard跟球員的生涯長度與對應時間受傷
機率,應該相對更準,或者是利用black-scholes來做定價等等,
但也許這樣的解釋力已經相對足夠了!