[問題] text2vec 怎麼取 context vectors

作者: augustana (微小的希望)   2017-07-26 18:09:02
[問題類型]:
程式諮詢(我想用R 做某件事情,但是我不知道要怎麼用R 寫出來)
[軟體熟悉度]:
入門(寫過其他程式,只是對語法不熟悉)
[問題敘述]:
我想利用text2vec將文字向量化
不過我沒辦法抓到他的word vector跟context vector
感覺是因為這兩個向量是在glove這個environment中private的區塊
所以怎麼寫都抓不出來
當我輸入glove$的時候, R自己跑出來的選單都只有public的東西
environment如下, 其中w_i, w_j就是我想要抓的向量
<GloVe>
Inherits from: <word_embedding_model>
Public:
clone: function (deep = FALSE)
dump_every_n: 0
dump_model: function ()
fit: function (x, n_iter, convergence_tol = -1, ...)
get_history: function ()
get_word_vectors: function ()
initialize: function (word_vectors_size, vocabulary, x_max, learning_rate
= 0.15,
shuffle: FALSE
verbose: TRUE
Private:
alpha: 0.75
b_i: -0.088758796453476 -0.200479492545128 -0.276277631521225 ...
b_j: 0.158077865839005 0.00269329198636115 -0.506954908370972 ...
cost_history: 0.0582876185658562 0.0376007230450009 0.0264438356106707 ...
fitted: TRUE
glove_fitter: Rcpp_GloveFitter
grain_size: 100000
initial: NULL
internal_matrix_format: dgTMatrix
lambda: 0
learning_rate: 0.15
max_cost: 10
vocab_terms: 3000 體型 較 呵護 m62 護膚 新生兒 s12 級 白金 特極 頂級 ...
w_i: 0.194914728403091 -0.0265734232962132 -0.611702501773834 ...
w_j: 0.273217976093292 -0.193755224347115 0.475706458091736 0 ...
word_vectors_history: NULL
word_vectors_size: 50
x_max: 10
[程式範例]:
library(text2vec)
keyword <- as.character(article_list$keyword[1])
keyword <- enc2utf8(keyword) #轉UTF8
keyword <- strsplit(keyword,',')
########計算不重複的詞
# iterator
token <- itoken(keyword)
# to create unique word matrix
vocab <- create_vocabulary(token, ngram=c(1, 1)) #詞,頻率,文章佔比
##只篩出現5次以上的詞
#vocab <- prune_vocabulary(vocab, term_count_min = 5L)
########向量化
# vectorization of words
vectorizer <- vocab_vectorizer(vocab,
grow_dtm= FALSE, #don't vectorize input
skip_grams_window= 5L) #use window of 5 for
context words
# tcm= term co-occurrence matrix 字段共現矩陣
tcm <- create_tcm(token, vectorizer)
# glove fitting model, 分解TCM矩陣
glove <- GlobalVectors$new(word_vectors_size = 50, vocabulary = vocab, x_max
= 10)
glove$fit(tcm, n_iter = 20)
#詞向量
word.vec <- glove$word_vectors$w_i + #文字向量
glove$word_vectors$w_j #脈絡向量
就是最後一行這出了問題
不知道是不是因為text2vec裡面的glove()已經被刪除
改成GlobalVectors()的關係所以這條就失敗了
[環境敘述]:
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Traditional)_Taiwan.950 LC_CTYPE=Chinese
(Traditional)_Taiwan.950
[3] LC_MONETARY=Chinese (Traditional)_Taiwan.950 LC_NUMERIC=C
[5] LC_TIME=Chinese (Traditional)_Taiwan.950
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringi_1.1.2 RODBC_1.3-14 text2vec_0.4.0
loaded via a namespace (and not attached):
[1] compiler_3.4.0 magrittr_1.5 R6_2.2.0 Matrix_1.2-9
tools_3.4.0
[6] Rcpp_0.12.11 codetools_0.2-15 grid_3.4.0
iterators_1.0.8 foreach_1.4.3
[11] data.table_1.10.4 digest_0.6.10 RcppParallel_4.3.20
lattice_0.20-35
[關鍵字]:
text2vec, environment, private

Links booklink

Contact Us: admin [ a t ] ucptt.com