ggplot ヒストグラム　積み上げ順序制御

2022年6月13日 14:44

ヒストグラムの種類

目的に応じて、"identity","stack","fill" それぞれのパラメータを指定する。
ヒストグラムのx軸に連続量を指定する場合、離散量を指定する場合で結果が全く異なるので注意すること。

identity　実数値
stack　積み上げグラフ
fill 割合を示す。

p <- p + geom_histogram(aes(fill=sign),position = "identity", alpha = 0.3,bins=120)
p <- p + geom_histogram(bins=50,position = "fill", alpha = 0.9)
p <- p + geom_histogram(bins=80,position = "stack", alpha = 0.9)

積み上げの順序を制御するためfactor を使用する

積み上げを行う際データを積む順序をRが自動的にソートすることがある。任意の順序で積むためにはfactorを利用して順序を指定する。ここでは、levels= で順序を指定している。
(注)このサンプルコードでは元々のデータがロング型であるためidentity 指定でも積み上げグラフになる。

curl <- "https://vrs-data.cio.go.jp/vaccination/opendata/latest/prefecture.ndjson"
cdestfile <- "~/R/R2/covid/prefecture.ndjson"
download.file(curl,cdestfile)
js <- jsonlite::stream_in(gzfile(cdestfile) )
last(js)

df <- cbind(js,cat=paste(js[,6],js[,3],js[,4],sep='-'))
# column "cat2" is automatically created
df$cat2 <- factor(df$cat,levels=c("2-F--64", "2-F-65-", "2-M--64", "2-M-65-", "2-U--64", "2-U-65-", "2-U-UNK","1-F--64", "1-F-65-", "1-M--64", "1-M-65-", "1-M-UNK", "1-U--64", "1-U-65-", "1-U-UNK"))  # factoring cat to set seq.
df$date <- as.Date(df$date)
# use column cat2 for fill
p <- ggplot(df, aes(y = count, x = prefecture, fill = cat2))
p <- p + theme_dark (base_family = "HiraKakuPro-W3")
p <- p + theme(axis.text.x = element_text(angle = 90, hjust = 1))
p <- p + theme(panel.background = element_rect(fill = "black",
                                               colour = "lightblue"),
               legend.key = element_rect(fill='black',colour='white'))
# name for legend is specified in scale_fill_manual
p <- p + scale_fill_manual(name="分類",values=rainbow(15),
                           label=c("2回目女性64歳以下","2回目女性65歳以上","2回目男性64歳以下","2回目男性65歳以上","2回目性別不明64歳以下","2回目性別不明65歳以上","2回目性別不明年齢不明",
                                   "1回目女性64歳以下","1回目女性65歳以上","1回目男性64歳以下","1回目男性65歳以上","1回目男性年齢不明","1回目性別不明64歳以下","1回目性別不明65歳以上","1回目
性別不明年齢不明"))


p <- p + geom_bar(stat = "identity")
# p <- p + theme(legend.position = 'none')
p <- p + scale_x_discrete(label=substr(pref_jp,1,3))
plot(p)

なお、js データの内容は以下の通り。上のサンプルコードではstatus(接種状況),gender,ageを接合している。

          date prefecture gender age medical_worker status count
1   2021-04-12         01      F -64          FALSE      1     7
2   2021-04-12         01      F 65-          FALSE      1    84
3   2021-04-12         01      M -64          FALSE      1     1
4   2021-04-12         01      M 65-          FALSE      1    21
5   2021-04-12         01      U UNK          FALSE      1     1
6   2021-04-12         02      F 65-          FALSE      1   142
7   2021-04-12         02      M 65-          FALSE      1    53
8   2021-04-12         02      U UNK          FALSE      1     2
9   2021-04-12         03      F 65-          FALSE      1    23
10  2021-04-12         03      M 65-          FALSE      1    27

X軸の間隔指定

Date型のときに使用する。間隔は1年、ラベルは年を使用する。

p <- p + scale_x_date(date_breaks = "1 year", date_labels = "%Y")

X軸の項目名並び指定

データにfactor型を使い、その定義の際指定する方法もあるがここではscale_x_discrete を使用する。

p <- p + scale_x_discrete(limits=unique(df$prefecture_name),label=substr(unique(df$prefecture_name),1,3))

X軸　時間量に応じたグラデーション

時間量に応じてグラデーションで色を変えたいときに使用する。

p <- p + scale_fill_date(low = "green3" , high = "darkgreen")

ggplot ヒストグラム 積み上げ順序制御