[ 備忘録 ] Rでテキスト分析（データ分析編その1）

2021年12月20日 23:20

前回の記事の続きです。

まずは、分解された形態素の出現回数を数えてみる。

dat_count <- dat %>% filter(class == "名詞") %>% group_by(term, sentences) %>% mutate(wordCount = n()) %>% distinct()
head(dat_count)

実行すると、以下のようになる。

A tibble: 6 × 4
Groups: term, sentences [6]
term class sentences wordCount
<chr> <chr> <int> <int>
1 メロス名詞 1 10
2 激怒名詞 1 1
3 邪智名詞 1 1
4 暴虐名詞 1 1
5 ゃちぼうぎゃく名詞 1 1
6 王名詞 1 1

次に、結果を表示するためのマトリックスを作ってみる。

dat_BoW <- dat_count %>% select(- class) %>% pivot_wider(names_from = term, values_from = wordCount, values_fill = list(wordCount = 0))
head(dat_BoW)

実行すると、次のような結果が表示される。

A tibble: 6 × 720
Groups: sentences [6]
sentences メロス激怒邪智暴虐ゃちぼうぎゃく王決意政治村牧人笛羊邪悪人一倍敏感
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 10 1 1 1 1 1 1 1 3 2 1 1 1 1 1
2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

単語出現回数マトリックスを作成することができた。

今度は、よく見るワードクラウドを作ってみる。

library(wordcloud)
dat %>% filter(class == "名詞") %>% select(term) %>% table() %>% wordcloud(words = names(.), freq = ., min.freq = 3)

実行すると･･･

たくさんのお豆腐！！！
フォントを指定して再度実行すると･･･

par(family = "HiraKakuProN-W3")
dat %>% filter(class == "名詞") %>% select(term) %>% table() %>% wordcloud(words = names(.), freq = ., min.freq = 3)

うまくできました！

[ 備忘録 ] Rでテキスト分析（データ分析編その1）

いいなと思ったら応援しよう！