Total Pageviews

Friday, 25 February 2022

Getting Started with Moses


This section will show you how to install and build Moses, and how to use Moses to translate with some simple models. If you experience problems, then please check the support page. If you do not want to build Moses from source, then there are packages available for Windows and popular Linux distributions.

Compiling Moses with bjam

To compile with bare minimum of features:

git clone https://github.com/moses-smt/mosesdecoder

cd mosesdecoder

./bjam -j4

If you have compiled boost manually, then tell bjam where it is:

    ./bjam --with-boost=~/workspace/temp/boost_1_64_0 -j8

If you have compiled the cmph library manually:

    ./bjam --with-cmph=/Users/hieu/workspace/cmph-2.0

If you have compiled the xmlrpc-c library manually:

    ./bjam --with-xmlrpc-c=/Users/hieu/workspace/xmlrpc-c/xmlrpc-c-1.33.17

If you have compiled the xmlrpc-c library manually:

    ./bjam --with-irstlm=/Users/hieu/workspace/irstlm/irstlm-5.80.08/trunk 

This is the exact command I (Hieu) used on Linux:

   ./bjam --with-boost=/home/s0565741/workspace/boost/boost_1_57_0 --with-cmph=/home/s0565741/workspace/cmph-2.0 --with-irstlm=/home/s0565741/workspace/irstlm-code --with-xmlrpc-c=/home/s0565741/workspace/xmlrpc-c/xmlrpc-c-1.33.17 -j12

Manually installing Boost

Boost 1.48 has a serious bug which breaks Moses compilation. Unfortunately, some Linux distributions (eg. Ubuntu 12.04) have broken versions of the Boost library. In these cases, you must download and compile Boost yourself.

This is the exact commands I (Hieu) use to compile boost:

   wget https://dl.bintray.com/boostorg/release/1.64.0/source/boost_1_64_0.tar.gz
   tar zxvf boost_1_64_0.tar.gz 
   cd boost_1_64_0/
   ./bootstrap.sh 
   ./b2 -j4 --prefix=$PWD --libdir=$PWD/lib64 --layout=system link=static install || echo FAILURE

This create library file in the directory lib64, NOT in the system directory. Therefore, you don't need to be system admin/root to run this. However, you will need to tell moses where to find boost, which is explained below

Once boost is installed, you can then compile Moses. However, you must tell Moses where boost is with the --with-boost flag. This is the exact commands I use to compile Moses:

   ./bjam --with-boost=~/workspace/temp/boost_1_64_0 -j4

Other software to install

Word Alignment

Moses requires a word alignment tool, such as giza++mgiza, or Fast Align.

I (Hieu) use MGIZA because it is multi-threaded and give general good result, however, I've also heard good things about Fast Align. You can find instructions to compile them here.

Language Model Creation

Moses includes the KenLM language model creation program, lmplz.

You can also create language models with IRSTLM and SRILM. Please read this if you want to compile IRSTLM. Language model toolkits perform two main tasks: training and querying. You can train a language model with any of them, produce an ARPA file, and query with a different one. To train a model, just call the relevant script.

If you want to use SRILM or IRSTLM to query the language model, then they need to be linked with Moses. For IRSTLM, you first need to compile IRSTLM then use the --with-irstlm switch to compile Moses with IRSTLM. This is the exact command I used:

   ./bjam --with-irstlm=/home/s0565741/workspace/temp/irstlm-5.80.03 -j4

Personally, I only use IRSTLM as a query tool in this way if the LM n-gram order is over 7. In most situation, I use KenLM because KenLM is multi-threaded and faster.

Platforms

The primary development platform for Moses is Linux, and this is the recommended platform since you will find it easier to get support for it. However Moses does work on other platforms:

Linux Installation

Debian

Install the following packages using the command

   su
   apt-get install [package name]

Packages:

   git
   subversion
   make
   libtool
   gcc
   g++
   libboost-dev
   tcl-dev
   tk-dev
   zlib1g-dev
   libbz2-dev
   python-dev
   libicu-dev (Debian)
   libunistring-dev (Debian)

Ubuntu

Install the following packages using the command

   sudo apt-get install [package name]

Packages:

   g++ 
   git 
   subversion
   automake
   libtool
   zlib1g-dev
   libicu-dev
   libboost-all-dev
   libbz2-dev
   liblzma-dev
   python-dev
   graphviz
   imagemagick
   make
   cmake
   libgoogle-perftools-dev (for tcmalloc)
   autoconf
   doxygen

Fedora / Redhat / CentOS / Scientific Linux

Install the following packages using the command

   su
   yum install [package name]

Packages:

   git
   subversion
   make
   automake
   cmake
   libtool
   gcc-c++
   zlib-devel
   python-devel
   bzip2-devel
   boost-devel
   ImageMagick
   cpan
   expat-devel

In addition, you have to install some perl packages:

   cpan XML::Twig
   cpan Sort::Naturally

OSX Installation

Mac OSX is widely used by Moses developers and everything should run fine. Installation is the same as for Linux.

Mac OSX out-of-the-box doesn't have many programs that are critical to Moses, or different version of standard GNU programs. For example, splitsortzcat are incompatible BSD-versions rather than GNU versions.

Therefore, Moses has been tested with Mac OSX with Mac Ports. Make sure you have this installed on your machine. Success has also been reported with brew installation. Do note, however, that you will need to install xmlrpc-c independently, and then compile with bjam using the --with-xmlrpc-c=/usr/local flag (where /usr/local/ is the default location of the xmlrpc-c installation).

Recent versions of OSX have clang C/C++ compiler, rather than gcc. When compiling with bjam, you must add the following:

   ./bjam toolset=clang

This is the exact command I (Hieu) use on OSX Yosemite:

   ./bjam --with-boost=/Users/hieu/workspace/boost/boost_1_59_0.clang/ --with-cmph=/Users/hieu/workspace/cmph-2.0 --with-xmlrpc-c=/Users/hieu/workspace/xmlrpc-c/xmlrpc-c-1.33.17 --with-irstlm=/Users/hieu/workspace/irstlm/irstlm-5.80.08/trunk --with-mm --with-probing-pt -j5 toolset=clang -q -d2

You also need to add this argument when manually compiling boost. This is the exact command I use:

    ./b2 -j8 --prefix=$PWD --libdir=$PWD/lib64 --layout=system link=static toolset=clang  install || echo FAILURE

Windows Installation

Moses can run on Windows 10 with Ubuntu 16.04 subsystem, available within windows programs' feature tab. More information here:

   https://docs.microsoft.com/en-us/windows/wsl/install-win10

Thereafter, installation is exactly the same as for Ubuntu.

Run Moses for the first time

Download the sample models and extract them into your working directory:

 cd ~/mosesdecoder
 wget http://www.statmt.org/moses/download/sample-models.tgz
 tar xzf sample-models.tgz
 cd sample-models

Run the decoder

 cd ~/mosesdecoder/sample-models
 ~/mosesdecoder/bin/moses -f phrase-model/moses.ini < phrase-model/in > out

If everything worked out right, this should translate the sentence "das ist ein kleines haus" (in the file in) as "this is a small house" (in the file out).

Note that the configuration file moses.ini in each directory is set to use the KenLM language model toolkit by default. If you prefer to use IRSTLM, then edit the language model entry in moses.ini, replacing KENLM with IRSTLM. You will also have to compile with ./bjam --with-irstlm, adding the full path of your IRSTLM installation.

Moses also supports SRILM and RandLM language models. See here for more details.

Chart Decoder

The chart decoder is part of the same executable as of version 3.0.

You can run the chart demos from the sample-models directory as follows

 ~/mosesdecoder/bin/moses -f string-to-tree/moses.ini < string-to-tree/in > out.stt
 ~/mosesdecoder/bin/moses -f tree-to-tree/moses.ini < tree-to-tree/in.xml > out.ttt

The expected result of the string-to-tree demo is

 this is a small house

Next Steps

Why not try to build a Baseline translation system with freely available data?

bjam options

This is a list of options to bjam. On a system with Boost installed in a standard path, none should be required, but you may want additional functionality or control.


from  https://www.statmt.org/moses/?n=Development.GetStarted

No comments:

Post a Comment