NDark (溺於黑暗)
2015-01-24 13:36:21The Game Outcomes Project, Part 4: Crunch Makes Games Worse
撰文:Paul Tozour
This article is the fourth in a 5-part series.
Part 1: The Best and the Rest is also available here: (Gamasutra)
(BlogSpot) (in Chinese)
Part 2: Building Effective Teams is available here: (Gamasutra)
(BlogSpot) (in Chinese)
Part 3: Game Development Factors is available here: (Gamasutra)
(BlogSpot) (in Chinese)
This article is Part 4, and a Chinese translation will soon be available.
Part 5 will be published in late January 2015.
For extended notes on our survey methodology, see our Methodology blog
Our raw survey data (minus confidential info) is now available here if
you'd like to verify our results or perform your own analysis.
The Game Outcomes Project team includes Paul Tozour, David Wegbreit, Lucien
Parsons, Zhenghua “Z” Yang, NDark Teng, Eric Byron, Julianna Pillemer, Ben
Weber, and Karen Buro.
"遊戲專案為何成功"團隊成員包含Paul Tozour,David Wegbreit,Lucien Parsons,
Zhenghua “Z” Yang,NDark Teng,Eric Byron,Julianna Pillemer,Ben Weber,及
Karen Buro。
The Game Outcomes Project, Part 4: Crunch Makes Games Worse
Extended overtime (“crunch”) is a deeply controversial topic in our
industry. Countless studios have undertaken crunch, sometimes extending to
mandatory 80-100 hour work weeks for years at a time. If you ask anyone in
the industry about crunch, you’re likely to hear opinions stated very
strongly and matter-of-factly based on that person’s individual experience.
And yet such opinions are almost invariably put forth with zero reference to
any actual data.
If we truly want to analyze the impact of extended overtime in any scientific
and objective way, we should start by recognizing that any individual game
project must be considered meaningless by itself – it is a single data
point, or anecdotal evidence. We can learn absolutely nothing from whether a
single successful or unsuccessful game involved crunch or not, because we
cannot know how the project might have turned out if the opposite path had
been chosen – that is, if a project that crunched had not done so, or if a
project that did not employ crunch had decided to use it.
As the saying goes, you can’t prove (or disprove) a counterfactual – you’d
need a time machine to actually know how things would have turned out if you’
d chosen differently.
Furthermore, there have undeniably been many successful and unsuccessful
games created both with and without crunch. So we can’t give crunch the
exclusive credit or blame for a particular outcome on a single project when
much of the credit or blame is clearly owed to other aspects of the game’s
development. To truly measure the effect of crunch, we would need to look at
a large sample, ideally involving hundreds of game projects.
Thankfully, the Game Outcomes Project survey has given us exactly that. In
previous articles, we discussed the origin of the Game Outcomes Project and
our preliminary findings, and our findings related to team effectiveness and
many additional factors we looked at specific to game development. We also
wrote up a separate blog post describing the technical details of our
In this article, we present our findings on extended overtime based directly
on our survey data.
Attitudes Toward Crunch
Developers have surprisingly divergent attitudes toward the practice of
crunch. An interview on gamesindustry.biz quoted well-known industry figures
Warren Spector and Jason Rubin:
引述Warren Spector 與 Jason Rubin 的說法:
“Crunch sucks, but if it is seen by the team members as a fair cost of
participating in an otherwise fantastic employment experience, if they value
ownership of the resulting creative success more than the hardship, if the
team feels like long hours of collaboration with close friends is ultimately
rewarding, and if they feel fairly compensated, then who are we to tell them
otherwise?" asked Rubin.
[…] "Look, I'm sure there have been games made without crunch. I've never
worked on one or led one, but I'm sure examples exist. That tells me
something about myself and a lot about the business I'm in," said Spector.
[…] "What I'm saying is that games - I'm talking about non-sequels,
non-imitative games - are inherently unknowable, unpredictable, unmanageable
things. A game development process with no crunch? I'm not sure that's
possible unless you're working on a rip-off of another game or a low-ambition
“[…] Crunch is the result of working with a host of unknown factors in
creative mediums. Since game development is always full of unknowns, crunch
will always exist in studios that strive for quality […] After 30 years of
making games I'm still waiting to find the wizard who can avoid crunch
entirely without compromising at a level I'm unwilling to accept.”
On the other side of the fence is Derek Paxton of Stardock, who said in an
interview with Gameranx:
在Gameranx的訪問中,Stardock 的 Derek Paxton 表達了另一個角度的看法:
“Crunch makes zero sense because it makes games worse. Companies crunch to
push through on a specific game, but the long-term effect is that talented
developers, artists, producers and designers burn out and leave the industry.
“Companies and individuals should stop wearing their time spent crunching as
a badge of honor. Crunch is a symptom of broken management and process.
Crunch is the sacrifice of your employees. I would ask them why crunch isn’t
an issue with other industries. Why isn’t crunch an issue at all game
“Employees should see it as a failure. Gamers should be concerned about it,
because in the long term the hobby they love is losing talent because of it.
Companies should do everything in their power to improve their processes to
avoid these consequences.”
So who is right – Spector and Rubin, or Paxton?
所以誰才是對的?Spector 及 Rubin,還是 Paxton?
[Full disclosure: team member Paul Tozour leads Mothership Entertainment,
whose flagship game is being published by Stardock.]
[搶先報:Paul Tozour 率領母艦娛樂這間公司,他們的主打遊戲會被 Stardock 發布。]
In the Game Outcomes Project survey, we provided 3 text boxes at the end that
respondents could use to tell us about their industry experiences. Where
they mention crunch, they invariably mention it as a net negative. One
respondent wrote:
“The biggest issue we had was that the lead said ‘Overtime is part of game
development’ and never TRIED to improve. As sleep was lost, motivation
dropped and the staff lost hope ... everything fell apart. Hundred-hour
weeks for nine months, and I'm not exaggerating. Humans can't function under
these conditions ... If you want to mention my answer feel free. I'm sure
it'd be familiar to many devs.”
的加班這樣的環境下工作... 假如你們希望我老實講,我敢保證其他團隊狀況都相同。"
Another developer put it more bluntly:
“Schedule 40 hours a week and you get 38. Schedule 50 and you get 39 and
everyone hates work, life, and you. Schedule 60 and you get 32 and wives
start demanding you send out resumes. Schedule 80 and you’re [redacted] and
get sued, jackass.”
In this article, we will be getting a final word on the subject from the one
source that has yet to be interviewed: the data.
The “Extraordinary Effort” Argument
We’ll begin by formulating the “pro-crunch” side of the discourse into
testable hypotheses. Although no one directly claims that crunch is good per
se, and no one denies that it can have harmful effects, Spector and Rubin
clearly make the case in the article above that crunch is often (if not
usually, or even always) a necessary evil.
雖然沒人直接聲稱加班本身就是好事,也沒有人否認它有害,Spector 與 Rubin 清楚地
According to this line of thinking, ordinary development with ordinary
schedules cannot produce extraordinary results. We believe an accurate
characterization of this viewpoint from the gamesindustry.biz article quoted
above would be: “Extraordinary results require extraordinary effort, and
extraordinary effort demands long hours.”
This position (we’ll call it the “extraordinary effort argument”) leads
directly to two falsifiable hypotheses:
1. If the “extraordinary effort argument” is correct, there should be a
positive correlation between crunch and game outcomes, and higher levels of
crunch should show a measurable improvement in the outcomes of game projects.
2. If the “extraordinary effort argument” is correct, there should be
relatively few, if any, highly successful projects without crunch.
# 假如超凡努力的論點是對的,那麼在我們的問卷中加班與遊戲產出分數上會有正相關
# 假如超凡努力的論點是對的,那麼應該不可能發生沒加班卻高度成功的專案。
Luckily for us, we have data from hundreds of developers who took our survey
with no preconceptions as to what the study was designed to test, and which
we can use to verify both of these statements. We’ll agree to declare
victory for the pro-crunch side if EITHER of these hypotheses remains
standing after we put it in the ring with our data set.
Crunching the Numbers
We’ll approach our analysis in several phases, carefully determining what
the data does and does not tell us.
Our 2014 survey asked the following five questions related to crunch, which
were randomly scattered throughout the survey:
#“I worked a lot of overtime or ‘crunched’ on this project.”
#“I often worked overtime because I was required or felt pressured to.”
#“Our team sometimes seemed to be stuck in a cycle of never-ending crunch
/ overtime work.”
#“If we worked overtime, I believe it was because studio leaders or
producers failed to scope the project properly (e.g. insufficient manpower,
deadlines that were too tight, over-promised features).”
#“If I worked overtime, it was only when I volunteered to do so.”
# 我在專案中超時工作。
# 因為感受到壓力,我常常超時工作。
# 我們的團隊常常感覺到受阻礙,並陷入無止盡的加班。
# 需要加班的原因是領導層與製作人在時程上搞砸了。(人力不足,估計期限過短,過
# 我加班是因為我自願加班。
Here’s how the answers to those questions correlate with our aggregate
project outcome score (described on our Methodology page). On the horizontal
axis, a score of -1.0 is “disagree completely” and a score of +1.0 is “
agree completely."
Figure 1. Correlation of each crunch-related question with that project’s
actual outcome (aggregate score). Each of the 5 questions is shown, as an
animated GIF with a 4-second delay. Only the horizontal axis changes.加班相關
The correlations are as follows: -0.24, -0.30, -0.47, -0.36, +0.36 (in the
same order listed in the bullet-pointed list above). All five of these
correlations have statistical p-values well below 0.001, indicating that they
are statistically significant. Note how all the correlations are strongly
negative except for the final question, which asked whether crunch was solely
“But wait,” a proponent of crunch might say. “Surely that’s only because
you’re using a combined score. That score combines the values of questions
like ‘this project met its internal goals,’ which are going to give you
lower values, because they're subjective fluff. Of course people who are
unhappy about crunch are going to give that factor low scores – and that’s
going to lower the combined score a lot. It’s a fudge factor, and it’s
skewing your results. Throw it out! You should throw away the critical
success, delays, and internal goals outcomes and JUST look at return on
investment and I bet you’ll see a totally different picture.”
OK, let’s do that:
Figure 2. Correlation of each of the 5 crunch-related questions with that
project’s return on investment (ROI). As with Figure 1, each of the 5
questions is shown, as an animated GIF with a 4-second delay. Only the
horizontal axis changes. Note that many of the points shown represent
multiple coincident points. See our Methodology page for an explanation of
the vertical axis scale.五個加班相關問題對上專案利潤的關聯度,如圖一相同,每個
Notice how the lines have essentially the same slopes as in the previous
figure. The correlations with ROI are as follows (in the same order): -0.18,
-0.26, -0.34, -0.23, and +0.28. All of these correlations have p-values
below 0.012.
Still not convinced? Here are the same graphs again, correlated against
aggregate reviews / MetaCritic scores.
Figure 3. Correlation of each of the 5 crunch-related questions with the
project’s aggregate reviews / MetaCritic score (note that the vertical axis
does not represent actual MetaCritic scores but is a normalized
representation of the answers to this question; see our Methodology page for
more info). As with Figures 1 and 2, each of the 5 questions is shown, as an
animated GIF with a 4-second delay. Note that many of the points shown
represent multiple coincident points. Only the horizontal axis changes.五個加
The results are essentially identical, and all have p-values under 0.05.
So if our combined score has a negative correlation with ALL our crunch
questions except the one about crunch being purely voluntary (which itself
does not imply any particular level of crunch), that means that we’ve
disproven the first part of the “extraordinary effort argument” – the
correlation is clearly negative, not positive.
Now let’s look at the second testable hypothesis of the “extraordinary
effort argument.”
In Figure 4 (below), we’re looking at the two most relevant questions
related to overall crunch for a project. The vertical axis is the aggregate
outcome score, while the horizontal axis represents the scale from “disagree
completely” (-1) to “agree completely.” The black lines are trend lines.
As you can see, in both cases, higher agreement with each statement
corresponds to inferior project outcomes.
Figure 4. The two most relevant questions related to crunch compared to the
aggregate project outcome score.兩個相關問題對上總和的產出分數。
We’ve added horizontal blue and orange lines to both images. The blue line
represents a score of 80, which will be our subjective threshold for “very
successful” projects. The orange line represents a score of 40, which will
be our threshold for “very unsuccessful” projects.
The dots above the blue line tell a clear story: in each case, there were
more successful games made without crunch than with crunch.
However, these charts don’t tell the full story by themselves; many of the
data points are clustered at the exact same spot, meaning that each dot can
actually represent several data points. So a statistical deep-dive is
necessary. We’re particularly interested the four corners of the chart –
the data points above the blue line on the extreme left and right sides of
each chart (below -0.6 and above +0.6 on the horizontal axis) and below the
orange line on the left and right sides.
Looking solely at the chart on the top of Figure 4 (“I worked a lot of
overtime or ‘crunched’ on this project”), we observed the following
pattern. Note that the percentages are given in terms of the total data
points in each vertical grouping (under -0.6 or above 0.6 on the horizontal
We can see clearly that a higher percentage of no-crunch projects succeed
than fail (17% vs 10%) and a much larger percentage of high-crunch projects
fail rather than succeeding (32% vs 13%). Additionally, a higher percentage
of the successful projects are no-crunch than high-crunch (17% vs 13%), while
a higher percentage of the unsuccessful projects are high-crunch vs no-crunch
(32% vs 10%).
Here’s the same chart, but this time looking at the bottom question, “Our
team sometimes seemed to be stuck in a cycle of never-ending crunch /
overtime work.”
These results are even more remarkable. The respondents that answered “
disagree strongly” or “disagree completely” were 2.5 times more likely to
be working on very successful projects (23% vs 9%), while the respondents who
answered “agree strongly” or “agree completely” were, incredibly, more
than 10 times more likely to be on unsuccessful projects than successful ones
(41% vs 4%).
Some might object to this way of measuring the responses, as it is an
aggregate outcome score which takes internal achievement of the project goals
into account – and this is a somewhat subjective measure. What if we looked
at return on investment (ROI) alone? Surely that would paint a different
Here is ROI:
Figure 5. The two most relevant questions related to crunch compared to
return on investment (ROI).最相關的兩個問題對上專案利潤的關聯性
The first question (top chart) gives us the following results:
The second question (bottom chart) gives us:
These results are essentially equivalent to what we got with Figure 4