LaMachine is a unified software distribution for Natural Language Processing

We integrate numerous open-source NLP tools, programming libraries, web-services
and web-applications in a single Virtual Research Environment
that can be installed on a wide variety of machines

Learn More

Important News!

LaMachine is end-of-life and being deprecated. Its usage is no longer recommended. See this post for reasons and alternative solutions.

Get Started.

LaMachine comes in various flavours, it can be installed as a Virtual Machine, a Docker container, or directly on your Linux or macOS system.

Start the bootstrap procedure by running a single command and let it guide you through your custom build, or just use one of our pre-built images with Vagrant or Docker!

Your Operating System?

Linux, BSD, or macOS

Windows 10 / 2016

Older Windows

Custom build or pre-built?

Own build (recommend)

Pre-built Docker image

Pre-built VM image

Open a terminal and run the following:

bash <(curl -s https://raw.githubusercontent.com/proycon/LaMachine/master/bootstrap.sh)

Open a terminal and run the above command to get started!

About LaMachine

The software included in LaMachine tends to be highly specialised and generally depends on a lot of other interdependent software. Installing all this software can be a daunting task, compiling it from scratch even more so. LaMachine attempts to make this process easier by offering pre-built recipes for a wide variety of systems, whether it is on your home computer or whether you are setting up a dedicated production environment, LaMachine will safe you a lot of work.

We address various audiences; the bulk of the software is geared towards data scientists who are not afraid of the command line and some programming. We give you the instruments and it is up to you to yield them. However, we also attempt to accommodate researchers that require more high-level interfaces by incorporating webservices and websites that expose some of the functionality to a larger audience.

Software

by the Centre for Language and Speech Technology, Radboud University Nijmegen (CLST, RU)
- Timbl - Tilburg Memory Based Learner (previously by TiCC, Tilburg University)
- Ucto - A rule-based tokeniser supporting multiple languages
- Frog - Frog is an integration of various memory-based natural language processing (NLP) modules developed for Dutch. It can do Part-of-Speech tagging, lemmatisation, named entity recogniton, shallow parsing, dependency parsing and morphological analysis.
- Mbt - Memory-based Tagger
- Wopr - Memory-based Word Predictor
- FoLiA-tools - Command line tools for working with the FoLiA format
- PyNLPl - Python Natural Language Processing Library
- Colibri Core - Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way.
- C++ libraries - ticcutils, libfolia
- Python bindings - python-ucto, python-frog, python-timbl
- CLAM - Quickly build RESTful webservices
- Gecco - Generic Environment for Context-Aware Correction of Orthography (powers Valkuil)
- Toad - Trainer Of All Data, training tools for Frog
- foliadocserve - FoLiA Document Server
- FLAT - FoLiA Linguistic Annotation Tool
- BabelEnte - An entity extractor, translator and evaluator that uses BabelFy
- Valkuil - A context-aware spelling corrector for Dutch
- TicclTools - Tools that together constitute the bulk of TICCL: Text Induced Corpus-Cleanup.
- PICCL - Pipelines for spelling correction and OCR post-correction system, implements TICCL (also by Tilburg University)
- Labirinto - A web-based portal listing all available tools in LaMachine, an ideal starting point for LaMachine
- Oersetter - A Frisian-Dutch/Dutch-Frisian Machine Translation system in collaboration with the Fryske Akademy
by the University of Groningen (RUG)
- Alpino, a dependency parser and tagger for Dutch
by Utrecht University (UU)
- T-scan - T-scan is a Dutch text analytics tool for readability prediction (initially developed at TiCC, Tilburg University)
by the Vrije Universiteit Amsterdam (VU)
- KafNafParserPy - A python module to parse NAF files
Major third party software (not exhaustive!)
- Python
  - Numpy
  - Scipy
  - Matplotlib
  - IPython
  - Pandas - Python Data Analysis Library
  - Scikit-learn - Machine learning in Python
  - NLTK - Natural Language Toolkit for Python
  - Spacy - Industrial-Strength NLP in Python
  - PyTorch - Deep learning library
- Java
  - Nextflow - A system and language for writing parallel and scalable pipelines in a portable manner. (written in Groovy)
  - Stanford CoreNLP
- R
- Jupyter
- Tesseract - Open Source Optical Character Recognition (OCR)
- Hunspell - A spell checker
- Tensorflow - Open-source machine learning framework
- Kaldi - Speech Recognition Framework (ASR)
- Moses - Statistical Machine Translation system

Documentation

For further documentation, please read the README and the Contributor Guidelines for technical details.

Contribute

LaMachine is open to participation by other open-source NLP software! Please read our Contributor Guidelines.

Support

If you have a problem, please report it on our Issue Tracker.