我自己來回答
預設過濾掉一個字符長度的詞
text = ["我|,|愛你|白Z",
"他|愛狗",
"貓|愛鼠"
]
vectorizer = CountVectorizer(min_df=1, token_pattern='(?u)\\b\\w+\\b')
vectorizer.fit(text)
vector = vectorizer.transform(text)
print (vectorizer.vocabulary_)
print (vector.shape)
print (vector.toarray())