bigscience

Research workshop on large language models - The Summer of Language Models 21

At the moment we have 2 code repos:

https://github.com/bigscience-workshop/Megatron-DeepSpeed - this is our flagship code base
https://github.com/bigscience-workshop/bigscience - (this repo) for everything else - docs, experiments, etc.

Currently, the most active segments of this repo are:

JZ - Lots of information about our work environment which helps evaluate, plan and get things done
Experiments - many experiments are being done. Documentation, result tables, scripts and logs are all there
Datasets info
Train - all the information about the current trainings (see below for the most important ones)

We have READMEs for specific aspects, such as:

hub integration

Trainings

While we keep detailed chronicles of experiments and findings for some of the main trainings, here is a doc that contains a summary of the most important findings: Lessons learned

Train 1 - 13B - unmodified Megatron gpt2 - baseline

the full spec and discussions
the training script
checkpoints and logs:
- tensorboard
- logs
chronicles

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \
https://huggingface.co/bigscience/tr1-13B-logs/resolve/main/main_log.txt

Train 3

Architecture and scaling baseline runs: no fancy tricks, just GPT2. Here are links to the respective tensorboards:

Size	1B3	760M	350M	125M
C4 + low warmup	a	b	c
OSCAR + low warmup	f
C4 + high warmup	e
OSCAR + high warmup	d (current baseline)	g	h	i
Pile + high warmup	m	j	k	l

Train 8

104B - unmodified Megatron gpt2 - with extra-wide hidden size to learn how to deal with training instabilities

the full spec and discussions
the training script
checkpoints and logs:
- tensorboard
- logs
chronicles

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \
https://cdn-lfs.huggingface.co/bigscience/tr8-104B-logs/b2cc478d5ae7c9ec937ea2db1d2fe09de593fa2ec38c171d6cc5dca094cd79f9

Train 11

This is the current main training

tr11-176B-ml

the full spec and discussions
the training script
checkpoints and logs:
- tensorboard
- logs
chronicles-prequel
chronicles

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -LsI $u]=~/2 200.*?content-length: (\d+)/s; \
print qx[curl -Lsr $b-$e $u] if $e>$b; $b=$e; sleep 300}' \
https://huggingface.co/bigscience/tr11-176B-ml-logs/resolve/main/logs/main/main_log.txt

Name		Name	Last commit message	Last commit date
Latest commit History 1,159 Commits
.github		.github
bigscience		bigscience
data		data
evaluation		evaluation
experiments		experiments
finetune		finetune
inference		inference
jz		jz
math		math
megatron-notes		megatron-notes
tests		tests
tools		tools
train		train
.editorconfig		.editorconfig
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
pytorch-notes.md		pytorch-notes.md
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py

License

bigscience-workshop/bigscience

Folders and files

Latest commit

History

Repository files navigation

bigscience

Trainings

Train 1 - 13B - unmodified Megatron gpt2 - baseline

Train 3

Train 8

Train 11

About

Topics

Resources

License

Stars

Watchers

Forks

Languages