dhSegment

About This Fork of dhSegment: This variant of dhSegment has been customized for Internet Archive. No attempt has been made to make the changes general enough to be folded by back into dhSegment as pull requests (except for minor fixes). The file prefix "ia_" is used to identify new files added. These might be helpful to others, at least for showing what is possible and how we did it. Files under dh_segment/ and train.py have not been changed.

dhSegment is a tool for Historical Document Processing. Its generic approach allows to segment regions and extract content from different type of documents. See some examples here.

The complete description of the system can be found in the corresponding paper.

It was created by Benoit Seguin and Sofia Ares Oliveira at DHLAB, EPFL.

Installation and usage

The installation procedure and examples of usage can be found in the documentation (see section below).

Demo

Have a try at the demo to train (optional) and apply dhSegment in page extraction using the demo.py script.

Documentation

The documentation is available on readthedocs.

If you are using this code for your research, you can cite the corresponding paper as :

@inproceedings{oliveiraseguinkaplan2018dhsegment,
  title={dhSegment: A generic deep-learning approach for document segmentation},
  author={Ares Oliveira, Sofia and Seguin, Benoit and Kaplan, Frederic},
  booktitle={Frontiers in Handwriting Recognition (ICFHR), 2018 16th International Conference on},
  pages={7--12},
  year={2018},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 407 Commits
demo		demo
dh_segment		dh_segment
doc		doc
exps		exps
labeling		labeling
pretrained_models		pretrained_models
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
general_config.json		general_config.json
ia_annotation_convert.py		ia_annotation_convert.py
ia_extract.py		ia_extract.py
ia_features.py		ia_features.py
ia_fetch_annotations.py		ia_fetch_annotations.py
ia_find_issues.py		ia_find_issues.py
ia_image_annotation.py		ia_image_annotation.py
ia_post_abs_model.py		ia_post_abs_model.py
ia_post_model.py		ia_post_model.py
ia_postprocess.py		ia_postprocess.py
ia_predict.py		ia_predict.py
ia_util.py		ia_util.py
ia_work_queue.py		ia_work_queue.py
setup.py		setup.py
train.py		train.py

License

tralfamadude/dhSegment

Folders and files

Latest commit

History

Repository files navigation

dhSegment

Installation and usage

Demo

Documentation

About

Resources

License

Stars

Watchers

Forks

Languages