※ 引述《matt0730 (要上台講話還真有點緊張)》之銘言:
: 政府機關,公股銀行,廠商等
: 整天放砲要搞大數據,社群
: 不搞就沒話題,落後,不創新
: 明明一堆行業,核心價值的是面對面的專業服務,不是所謂的虛實整合數位行銷
: 長官們連ptt, mobile01, fb都沒帳號
: 其實壓根心裡覺得年輕人的言論行為是屁
: 只會公開講一堆重視大數據,頃聽社群聲音的言論,每次都還要別人擬稿
: 當然長官一定也是被長官盯 才不情願搞這些
: 奇怪長官的長官的長官...是誰?
: 有長官腦殘 長官的長官腦殘 長官的...也腦殘 的掛嗎?
: Big Data Is Big Shit
^^^^^^^^^^^^^^^^^^^^
各位小妹、pavone、30cm、E cup、溫拿、勝利組、高富帥、真強者,大家好!
打給後!胎嘎侯!AV8D!
根據本魯的朋友表示,big data不是big shit,它是很值得研究der!只是不管本來
是做什麼領域,都被要求改成做big data,才是big shit der。
本pollo從來都搞不懂什麼是big data,直到朋友開示,才有略懂der感覺,以下文章供
參考,並附上中文大意:
http://cacm.acm.org/blogs/blog-cacm/155468-what-does-big-data-mean/fulltext
... big data can mean one of four things:
... big data有以下四類
Big volumes of data, but "small analytics." Here the idea is to support SQL
on very large data sets. Nobody runs "Select*" from something big as this
would overwhelm the recipient with terabytes of data. Instead, the focus is
on running SQL analytics (count, sum, max, min, and avg with an optional
group_by) on large amounts of data. I term this "small analytics" to
distinguish this use case from the one which follows.
第一種是大量資料配上小量分析,也就是要在大量資料上支援SQL(資料庫中
的查詢語言)的查詢,例如求總和、最大值、最小值、某部分當中的平均等。
Big analytics on big volumes of data. By big analytics, I mean data
clustering, regressions, machine learning, and other much more complex
analytics on very large amounts of data. At the present time users tend to
run big analytics using statistical packages, such as R, SPSS and SAS.
Alternately, they use linear algebra packages such as ScalaPack or Arpack.
Lastly, there is a fair amount of custom code (roll your own) used here.
大二種是大量資料配上大量分析,也就是要在大量資料上進行資料分群、迴歸
、各種機器學習、跑統計軟體等。
Big velocity. By this I mean being able to absorb and process a fire hose of
incoming data for applications like electronic trading, real-time ad
placement on Web pages, real-time customer targeting, and mobile social
networking. This use case is most prevalent in large Web properties and on
Wall Street, both of whom tend to roll their own.
第三種是處理快速灌進來的資料,最好能即時處理。
Big variety. Many enterprises are faced with integrating a larger and larger
number of data sources with diverse data (spreadsheets, Web sources, XML,
traditional DBMSs). Many enterprises view this as their number one headache.
Historically, the extract, transform, and load (ETL) vendors serviced this
market on modest numbers of data sources.
第四種是指資料來源或種類多樣,很多企業對此十分頭痛。
In summary, big data can mean big volume, big velocity, or big variety. In
the remainder of this post, I talk about small analytics on big volumes of
data. In three subsequent posts, I will discuss the other three problem
areas.
以下進入重點,但本魯的朋友還沒告訴本魯這在講什麼,所以以下省略。
本魯的朋友承認他是因為這篇文章的作者是2014年Turing award得主,才好奇點進去
看。本魯絕不承認本魯是本魯的朋友。