ちなみに1Bit(1.58bit) LLMについての考察はこのページが面白いので一読をお勧めする。
肝心の1Bit LLMの実装はここで公開されている。
まず、このHuggingFaceリポジトリを丸ごとgit cloneする
$ git lfs install
$ git clone https://huggingface.co/1bitLLM/bitnet_b1_58-3B
$ cd bitnet_b1_58-3B
from .modeling_bitnet import BitnetForCausalLM
from .tokenization_bitnet import BitnetTokenizer
from modeling_bitnet import BitnetForCausalLM
from tokenization_bitnet import BitnetTokenizer
from .configuration_bitnet import BitnetConfig
from .utils_quant import BitLinear
from configuration_bitnet import BitnetConfig
from utils_quant import BitLinear
$ pip install lm-eval=0.3.0
$ python eval_ppl.py --hf_path 1bitLLM/bitnet_b1_58-3B --seqlen 2048
2024-04-17 09:37:11.458427: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-17 09:37:11.498720: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-17 09:37:12.244885: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Downloading shards: 100%|███████████████████████| 3/3 [00:00<00:00, 6550.19it/s]
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
Loading checkpoint shards: 100%|██████████████████| 3/3 [00:03<00:00, 1.31s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Downloading readme: 100%|██████████████████| 41.1k/41.1k [00:00<00:00, 3.22MB/s]
Downloading data: 100%|████████████████████| 40.5M/40.5M [00:06<00:00, 5.82MB/s]
Generating validation split: 45576 examples [00:00, 236137.11 examples/s]
avg_loss = 3.28951310140223: 100%|██████| 10811/10811 [2:50:26<00:00, 1.06it/s]
c4 PPL: 9.777821724196246
Downloading readme: 100%|██████████████████████████████████████████████████████| 10.5k/10.5k [00:00<00:00, 20.1MB/s]
Downloading data: 100%|██████████████████████████████████████████████████████████| 733k/733k [06:33<00:00, 1.86kB/s]
Downloading data: 100%|████████████████████████████████████████████████████████| 6.36M/6.36M [00:00<00:00, 9.89MB/s]
Downloading data: 100%|██████████████████████████████████████████████████████████| 657k/657k [00:00<00:00, 2.03MB/s]
Generating test split: 100%|█████████████████████████████████████████| 4358/4358 [00:00<00:00, 254020.08 examples/s]
Generating train split: 100%|██████████████████████████████████████| 36718/36718 [00:00<00:00, 479998.17 examples/s]
Generating validation split: 100%|███████████████████████████████████| 3760/3760 [00:00<00:00, 270781.46 examples/s]
avg_loss = 3.3192534145618975: 100%|██████████████████████████████████████████████| 174/174 [01:57<00:00, 1.48it/s]
wikitext2 PPL: 9.98147770371931
[9.777821724196246, 9.98147770371931]
Avg PPL: 9.879649713957779
$ python
Python 3.10.9 (main, Mar 1 2023, 18:23:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tokenization_bitnet import BitnetTokenizer
/home/memeplex/.pyenv/versions/anaconda3-2023.03/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
>>> from modeling_bitnet import BitnetForCausalLM
2024-04-18 06:38:17.567606: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-18 06:38:17.603207: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-18 06:38:18.173705: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
>>> model_str="1bitLLM/bitnet_b1_58-3B"
>>> import torch
>>> model = BitnetForCausalLM.from_pretrained(
... model_str,
... device_map='auto',
... low_cpu_mem_usage=True,
... use_flash_attention_2=True,
... torch_dtype=torch.float16,
... ).half()
>>> tokenizer = BitnetTokenizer.from_pretrained(model_str,use_fast=False)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
>>> d = tokenizer("The most important thing of the United States",padding="max_length",return_tensors="pt")
>>> with torch.no_grad():
.... t = model.generate(d["input_ids"],attention_mask=d["attention_mask"],max_length=100,repetition_penalty=1.5)
>>> tokenizer.decode(t[0])
'<s> The most important thing of the United States is that it has a great history. It was founded by brave men and women who fought for freedom, liberty, justice, equality, democracy etc… They were all fighting to make this country better than what they had before so we can live in peace with each other as well as our environment.\nThe first president George Washington said “We hold these truths to be self-evident: That there are certain rights which God hath given us;'
入力したプロンプトは「The most important thing of the United States」で、出力は以下
'<s> The most important thing of the United States is that it has a great history. It was founded by brave men and women who fought for freedom, liberty, justice, equality, democracy etc… They were all fighting to make this country better than what they had before so we can live in peace with each other as well as our environment.\nThe first president George Washington said “We hold these truths to be self-evident: That there are certain rights which God hath given us;'
米国で最も重要なことは、素晴らしい歴史があるということです。この国は、自由、自由、正義、平等、民主主義などのために戦った勇敢な男女によって設立されました。彼らは皆、この国を以前よりも良くするために戦っていました。そうすれば、私たちはお互いに、そして環境とともに平和に暮らすことができます。 。\n初代大統領ジョージ ワシントンはこう言いました。「私たちはこれらの真実を自明であると考えています。神が私たちに与えてくださった一定の権利があるということです。」
>>> d = tokenizer("東京は",padding="max_length",return_tensors="pt")
>>> with torch.no_grad():
... t = model.generate(d["input_ids"],attention_mask=d["attention_mask"],max_length=100,repetition_penalty=1.5)
>>> tokenizer.decode(t[0])
'<s> 東京は、半年に終了した。\nTokyo is about to finish. (Japanese)\nThe Tokyo Olympics are over, and the city has been transformed into a ghost town for weeks now as people have fled in droves from their homes due to fears of an outbreak or even worse: COVID-19 itself. The games were supposed to be held between July 23rd – August 8th but'
'<s> 東京は、半年に終了した。\nTokyo is about to finish. (Japanese)\nThe Tokyo Olympics are over, and the city has been transformed into a ghost town for weeks now as people have fled in droves from their homes due to fears of an outbreak or even worse: COVID-19 itself. The games were supposed to be held between July 23rd – August 8th but'
'<s> 東京は、半年に終了しました。\n東京はもうすぐ終わります。 (日本語)\n東京オリンピックは閉幕しましたが、感染症の流行、あるいはさらに悪いことに、新型コロナウイルス感染症そのものの恐怖から、人々が大挙して家から逃げたため、東京は数週間にわたってゴーストタウンと化しています。試合は7月23日から8月8日まで開催される予定だったが、