f2CBVx

ppt.cc/fVjECx ppt.cc/fEnHsx ppt.cc/fRZTnx ppt.cc/fSZ3cx ppt.cc/fLOuCx ppt.cc/fE9Nux ppt.cc/fL5Kyx ppt.cc/fIr1ax ppt.cc/f71Yqx tecmint.com linuxcool.com linux.die.net linux.it.net.cn ostechnix.com unix.com ubuntugeek.com runoob.com man.linuxde.net ppt.cc/fwpCex ppt.cc/fxcLIx ppt.cc/foX6Ux linuxprobe.com linuxtechi.com howtoforge.com linuxstory.org systutorials.com ghacks.net linuxopsys.com ppt.cc/ffAGfx ppt.cc/fJbezx ppt.cc/fNIQDx ppt.cc/fCSllx ppt.cc/fybDVx ppt.cc/fIMQxx ppt.cc/fKlBax

Monday, 1 October 2018

文档转换器-Pandoc

The universal markup converter

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It can convert from

commonmark (CommonMark Markdown)
creole (Creole 1.0)
docbook (DocBook)
docx (Word docx)
epub (EPUB)
fb2 (FictionBook2 e-book)
gfm (GitHub-Flavored Markdown), or the deprecated and less accurate markdown_github; use markdown_github only if you need extensions not supported in gfm.
haddock (Haddock markup)
html (HTML)
jats (JATS XML)
json (JSON version of native AST)
latex (LaTeX)
markdown (Pandoc’s Markdown)
markdown_mmd (MultiMarkdown)
markdown_phpextra (PHP Markdown Extra)
markdown_strict (original unextended Markdown)
mediawiki (MediaWiki markup)
muse (Muse)
native (native Haskell)
odt (ODT)
opml (OPML)
org (Emacs Org mode)
rst (reStructuredText)
t2t (txt2tags)
textile (Textile)
tikiwiki (TikiWiki markup)
twiki (TWiki markup)
vimwiki (Vimwiki)

It can convert to

asciidoc (AsciiDoc)
beamer (LaTeX beamer slide show)
commonmark (CommonMark Markdown)
context (ConTeXt)
docbook or docbook4 (DocBook 4)
docbook5 (DocBook 5)
docx (Word docx)
dokuwiki (DokuWiki markup)
epub or epub3 (EPUB v3 book)
epub2 (EPUB v2)
fb2 (FictionBook2 e-book)
gfm (GitHub-Flavored Markdown), or the deprecated and less accurate markdown_github; use markdown_github only if you need extensions not supported in gfm.
haddock (Haddock markup)
html or html5 (HTML, i.e. HTML5/XHTML polyglot markup)
html4 (XHTML 1.0 Transitional)
icml (InDesign ICML)
jats (JATS XML)
json (JSON version of native AST)
latex (LaTeX)
man (groff man)
markdown (Pandoc’s Markdown)
markdown_mmd (MultiMarkdown)
markdown_phpextra (PHP Markdown Extra)
markdown_strict (original unextended Markdown)
mediawiki (MediaWiki markup)
ms (groff ms)
muse (Muse),
native (native Haskell),
odt (OpenOffice text document)
opml (OPML)
opendocument (OpenDocument)
org (Emacs Org mode)
plain (plain text),
pptx (PowerPoint slide show)
rst (reStructuredText)
rtf (Rich Text Format)
texinfo (GNU Texinfo)
textile (Textile)
slideous (Slideous HTML and JavaScript slide show)
slidy (Slidy HTML and JavaScript slide show)
dzslides (DZSlides HTML5 + JavaScript slide show),
revealjs (reveal.js HTML5 + JavaScript slide show)
s5 (S5 HTML and JavaScript slide show)
tei (TEI Simple)
zimwiki (ZimWiki markup)
the path of a custom lua writer, see Custom writers below

Pandoc can also produce PDF output via LaTeX, Groff ms, or HTML.

Pandoc’s enhanced version of Markdown includes syntax for tables, definition lists, metadata blocks, footnotes, citations, math, and much more. See the User’s Manual below under Pandoc’s Markdown.

Pandoc has a modular design: it consists of a set of readers, which parse text in a given format and produce a native representation of the document (an abstract syntax tree or AST), and a set of writers, which convert this native representation into a target format. Thus, adding an input or output format requires only adding a reader or writer. Users can also run custom pandoc filters to modify the intermediate AST (see the documentation for filters and lua filters).

Because pandoc’s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. Pandoc attempts to preserve the structural elements of a document, but not formatting details such as margin size. And some document elements, such as complex tables, may not fit into pandoc’s simple document model. While conversions from pandoc’s Markdown to all formats aspire to be perfect, conversions from formats more expressive than pandoc’s Markdown can be expected to be lossy.

Installing

Here’s how to install pandoc.

Documentation

Pandoc’s website contains a full User’s Guide. It is also available here as pandoc-flavored Markdown. The website also contains some examples of the use of pandoc and a limited online demo.

from https://github.com/jgm/pandoc

-------

Installing pandoc

Windows

There is a package installer at pandoc's download page. This will install pandoc, replacing older versions, and update your path to include the directory where pandoc's binaries are installed.
If you prefer not to use the msi installer, we also provide a zip file that contains pandoc's binaries and documentation. Simply unzip this file and move the binaries to a directory of your choice.
Alternatively, you can install pandoc using chocolatey: choco install pandoc.
For PDF output, you'll also need to install LaTeX. We recommend MiKTeX.

macOS

There is a package installer at pandoc's download page. If you later want to uninstall the package, you can do so by downloading this script and running it with perl uninstall-pandoc.pl.
We also provide a zip file containing the binaries and man pages, for those who prefer not to use the installer. Simply unzip the file and move the binaries and man pages to whatever directory you like.
Alternatively, you can install pandoc using homebrew: brew install pandoc. Note: If you are using macOS < 10.10, this method installs pandoc from source, so it will take a long time and a lot of disk space for the ghc compiler and dependent Haskell libraries.
For PDF output, you'll also need LaTeX. Because a full MacTeX installation takes more than a gigabyte of disk space, we recommend installing BasicTeX (64M) and using the tlmgr tool to install additional packages as needed. If you get errors warning of fonts not found, try
```
tlmgr install collection-fontsrecommended
```

Linux

First, try your package manager. Pandoc is in the Debian, Ubuntu, Slackware, Arch, Fedora, NiXOS, openSUSE, and gentoo repositories. Note, however, that versions in the repositories are often old.
We provide a binary package for amd64 architecture on the download page. This provides both pandoc and pandoc-citeproc. The executables are statically linked and have no dynamic dependencies or dependencies on external data files. Note: because of the static linking, the pandoc binary from this package cannot use lua filters that require external lua modules written in C.

Both a tarball and a deb installer are provided. To install the deb:
```
sudo dpkg -i $DEB
```
where $DEB is the path to the downloaded deb. This will install the pandoc and pandoc-citeproc executables and man pages.

If you use an RPM-based distro, you may be able to install the deb from our download page using alien.

On any distro, you may install from the tarball into $DEST (say, /usr/local/ or $HOME/.local) by doing
```
tar xvzf $TGZ --strip-components 1 -C $DEST
```
where $TGZ is the path to the downloaded zipped tarball. For Pandoc versions before 2.0, which don't provide a tarball, try instead
```
ar p $DEB data.tar.gz | tar xvz --strip-components 2 -C $DEST
```
You can also install from source, using the instructions below under [Compiling from source]. Note that most distros have the Haskell platform in their package repositories. For example, on Debian/Ubuntu, you can install it with apt-get install haskell-platform.
For PDF output, you'll need LaTeX. We recommend installing TeX Live via your package manager. (On Debian/Ubuntu, apt-get install texlive.)

BSD

Pandoc is in the NetBSD and FreeBSD ports repositories.

Compiling from source

If for some reason a binary package is not available for your platform, or if you want to hack on pandoc or use a non-released version, you can install from source.

Getting the pandoc source code

Source tarballs can be found at https://hackage.haskell.org/package/pandoc. For example, to fetch the source for version 1.17.0.3:

wget https://hackage.haskell.org/package/pandoc-1.17.0.3/pandoc-1.17.0.3.tar.gz
tar xvzf pandoc-1.17.0.3.tar.gz
cd pandoc-1.17.0.3

Or you can fetch the development code by cloning the repository:

git clone https://github.com/jgm/pandoc
cd pandoc

Note: there may be times when the development code is broken or depends on other libraries which must be installed separately. Unless you really know what you're doing, install the last released version.

Quick stack method

The easiest way to build pandoc from source is to use stack:

Install stack. Note that Pandoc requires stack >= 1.6.0.
Change to the pandoc source directory and issue the following commands:
```
stack setup
stack install
```
stack setup will automatically download the ghc compiler if you don't have it. stack install will install thepandoc executable into ~/.local/bin, which you should add to your PATH. This process will take a while, and will consume a considerable amount of disk space.

Quick cabal method

Install the Haskell platform. This will give you GHC and the cabal-install build tool. Note that pandoc requires GHC >= 7.10 and cabal >= 2.0.
Update your package database:
```
cabal update
```
Check your cabal version with
```
cabal --version
```
If you have a version less than 2.0, install the latest with:
```
cabal install cabal-install
```
Use cabal to install pandoc and its dependencies:
```
cabal install pandoc
```
This procedure will install the released version of pandoc, which will be downloaded automatically from HackageDB.

If you want to install a modified or development version of pandoc instead, switch to the source directory and do as above, but without the 'pandoc':
```
cabal install
```
Make sure the $CABALDIR/bin directory is in your path. You should now be able to run pandoc:
```
pandoc --help
```
Not sure where $CABALDIR is?
If you want to process citations with pandoc, you will also need to install a separate package, pandoc-citeproc. This can be installed using cabal:
```
cabal install pandoc-citeproc
```
By default pandoc-citeproc uses the "i;unicode-casemap" method to sort bibliography entries (RFC 5051). If you would like to use the locale-sensitive unicode collation algorithm instead, specify the unicode_collation flag:
```
cabal install pandoc-citeproc -funicode_collation
```
Note that this requires the text-icu library, which in turn depends on the C library icu4c. Installation directions vary by platform. Here is how it might work on macOS with homebrew:
```
brew install icu4c
cabal install --extra-lib-dirs=/usr/local/Cellar/icu4c/51.1/lib \
  --extra-include-dirs=/usr/local/Cellar/icu4c/51.1/include \
  -funicode_collation text-icu pandoc-citeproc
```
The pandoc.1 man page will be installed automatically. cabal shows you where it is installed: you may need to set your MANPATH accordingly. If MANUAL.txt has been modified, the man page can be rebuilt: make man/pandoc.1.

The pandoc-citeproc.1 man page will also be installed automatically.

Custom cabal method

This is a step-by-step procedure that offers maximal control over the build and installation. Most users should use the quick install, but this information may be of use to packagers. For more details, see the Cabal User's Guide. These instructions assume that the pandoc source directory is your working directory. You will need cabal version 2.0 or higher.

Install dependencies: in addition to the Haskell platform, you will need a number of additional libraries. You can install them all with
```
cabal update
cabal install --only-dependencies
```
Configure:
```
cabal configure --prefix=DIR --bindir=DIR --libdir=DIR \
  --datadir=DIR --libsubdir=DIR --datasubdir=DIR --docdir=DIR \
  --htmldir=DIR --program-prefix=PREFIX --program-suffix=SUFFIX \
  --mandir=DIR --flags=FLAGSPEC --enable-tests
```
All of the options have sensible defaults that can be overridden as needed.

FLAGSPEC is a list of Cabal configuration flags, optionally preceded by a - (to force the flag to false), and separated by spaces. Pandoc's flags include:
- embed_data_files: embed all data files into the binary (default no). This is helpful if you want to create a relocatable binary.
- https: enable support for downloading resources over https (using the http-client and http-client-tlslibraries).
Build:
```
cabal build
cabal test
```

Build API documentation:

cabal haddock --html-location=URL --hyperlink-source

Copy the files:
```
cabal copy --destdir=PATH
```
The default destdir is /.
Register pandoc as a GHC package:
```
cabal register
```
Package managers may want to use the --gen-script option to generate a script that can be run to register the package at install time.

Creating a relocatable binary

It is possible to compile pandoc such that the data files pandoc uses are embedded in the binary. The resulting binary can be run from any directory and is completely self-contained. With cabal, add -fembed_data_files to the cabal configure or cabal install commands.

With stack, use --flag pandoc:embed_data_files.

Running tests

Pandoc comes with an automated test suite. To run with cabal, cabal test; to run with stack, stack test.

To run particular tests (pattern-matching on their names), use the -p option:

cabal install pandoc --enable-tests
cabal test --test-options='-p markdown'

Or with stack:

stack test --test-arguments='-p markdown'

It is often helpful to add -j4 (run tests in parallel) and --hide-successes (don't clutter output with successes) to the test arguments as well.

If you add a new feature to pandoc, please add tests as well, following the pattern of the existing tests. The test suite code is in test/test-pandoc.hs. If you are adding a new reader or writer, it is probably easiest to add some data files to the test directory, and modify test/Tests/Old.hs. Otherwise, it is better to modify the module under the test/Testshierarchy corresponding to the pandoc module you are changing.

Running benchmarks

To build and run the benchmarks:

cabal configure --enable-benchmarks && cabal build
cabal bench

or with stack:

stack bench

To use a smaller sample size so the benchmarks run faster:

cabal bench --benchmark-options='-s 20'

To run just the markdown benchmarks:

cabal bench --benchmark-options='markdown'

from https://github.com/jgm/pandoc/blob/master/INSTALL.md

(https://pandoc.org/installing.html)

-----------------------------------------------------------

发现了一个可以将markdown快速转为word格式的小工具pandoc, 非常好用, 比如我有一个名为毕业论文.md的文件, 我只需在命令行运行

pandoc 毕业论文.md -o 毕业论文.docx

即可根据md文件生成新的docx文件!

pandoc支持相互转换的格式, 多的惊人!

Pandoc主站链接

安装包下载地址,

https://github.com/jgm/pandoc/releases/download/3.1.2/pandoc-3.1.2-windows-x86_64.msi

---------------------------------------------------------------------------------

使用pandoc转换pdf与docx加上书签目录

我说了我想要的一个 markdown 编辑器， typora 实际上非常优秀，但不开源。 Macdown 非常不错，开源。通过将其 mermaid 进行升级，viz.js 进行升级后，感觉非常不错了，唯一还有一点，就是导出 pdf 没有 Toc，而只能在页内设置 Toc，所以来研究一下用 pandoc 来进行转换看看效果如何。
pandoc
pandoc 网站 一句话介绍：
如果你需要在一个标记文件格式到另外一种标记文件格式间进行转换，那么 pandoc 就是你的瑞士军刀。
其可以在很多种文件间相互转换。
我最在意的是从 md 到 pdf 或者 md 到 word 的转换。
Typora 据说其转换是会将我们的代码转换成一自己专有的中间格式，进行导出。当然，其除了 pdf 和 html 外的导出是通过 pandoc 来实现的，其导出为 pdf 的效果实在是太棒了。
抽象语法树
我们可以用命令来生成一个抽象语法树的 JSON 表示：
pandoc -t json <input file>
如我的测试文件
# header
你好
```mermaid
graph TB;
a --> b;
\`\`\`
输出：
{
   "pandoc-api-version" : [
      1,
      20
   ],
   "meta" : {},
   "blocks" : [
      {
         "c" : [
            1,
            [
               "header",
               [],
               []
            ],
            [
               {
                  "c" : "header",
                  "t" : "Str"
               }
            ]
         ],
         "t" : "Header"
      },
      {
         "c" : [
            {
               "t" : "Str",
               "c" : "你好"
            }
         ],
         "t" : "Para"
      },
      {
         "c" : [
            [
               "",
               [
                  "mermaid"
               ],
               []
            ],
            "graph TB;\na --> b;"
         ],
         "t" : "CodeBlock"
      }
   ]
}
一个 pandoc 的 AST 包含一个 meta 块（包含如标题，作者，日期）等的元数据及一个由 Block 元素组成的列表。
在我们的例子中，有三个 Block 元素:Header, Str, CodeBlock。每个元素都有一个内容列表（由 Inline 元素组成）。
简单看一下 CodeBlock 的在 AST 内的组成，其包括两部分：
Attr 包括三个参数：(identifier, [classes],[(key,value)]) 分别是标识符，类列表，k-v键值对列表
Text 就是代码本身。事实pandoc 是进行了封装的 Unicode Text 字节
在我们的 md 文件中，将代码的类型，标注成了 classes 。
参考用 python 的一个 filter ，调用外部的 mermaid -cli 来进行渲染：
pandoc-mermaid-filter
基本语法
pandoc -s -f gfm -t pdf -o outputfile
-f FORMAT, -r FORMAT, —from=FORMAT, —read=FORMAT 输入文件格式
-t FORMAT, -w FORMAT, —to=FORMAT, —write=FORMAT 输出文件格式
-o 输出文件
-s, —standalone 增加页眉和页脚。pdf, epub, epub3, fb2, docx, 格式会自动设置此选项。
建立 PDF
最简单的命令就是：
pandoc test.txt -o test.pdf
pandoc 默认使用 LaTeX 来建立 PDF 文件，这就要求我们首先安装 latex 引擎。当然，其也可以使用 ConText, roff ms, HTML 来作为中间格式。需要中间格式的时候，我们需要为输出文件设置一个 .pdf 扩展，然后添加 —pdf—engine 选项或者 -t context, -t html 或者 -t ms。用来生成中间文件的工具通过 --pdf-engine 来进行指定。
pandoc -V 'CJKmainfont=Songti TC' -V mainfont=Menlo --from gfm --listings --pdf-engine=xelatex
可以通过变量控制 PDF 的风格，这依赖于我们使用的中间文件格式：查看 variables for LaTeX, variables for ConTeXt, variables for wkhtmltopdf, variables for ms. 。当我们使用 HTML 作为中间格式的时候，其输出可以用 --css 来控制风格
如果要调试 PDF 的生成，我们可以通过查看其中间表示：不使用 -o test.pdf 我们使用 -s -o test.tex 来生成 LaTex。然后用 pdflatex test.tex来进行测试。
当使用 LaTex 的时候，下面这些包必须可用（这些基本都包含在活跃的 Tex 版本中）： amsfonts, amsmath, lm, unicode-math, ifxetex, ifluatex, listings (如果使用 —listings 选项), fancyvrb, longtable, booktabs, graphicx (如果文档包含图片), hyperref, xcolor, ulem, geometry (geometry 变量已设置), setspace (与 linestretch 一起), and babel (with lang).
xelatex or lualatex 引擎需要 fontspec. xelatex 使用 polyglossia (with lang), xecjk, and bidi (with the dir variable set).
如果设置了 mathspec 变量，xelatex 会使用 mathspec 而不是 unicode-math。
upquote 和 microtype 包可用的话就会被使用，当 csquotes 被设置为 true 或者元数据字段被设置为 true 时，csquotes 会因为 typography 而使用。
下面这些包在存在的时候会用来提高输出的质量，但 pandoc 并不要求他们一定要存在： upquote (在逐字环境中使用直接引号), microtype (更好的间隔控制), parskip (更好的段间距控制), xurl (为了更好的URLs换行), bookmark (为更好的 PDF 书签), and footnotehyper or footnote (为了允许表中的脚注).
—pdf-engine
有多个 pdf 引擎：
pdflatex, lualatex, xelatex, latexmk, tectonic, wkhtmltopdf, weasyprint, prince, context, and pdfroff
如果引擎不在我们的路径变量中，那么就需要指定完整路径。如果没有指定这个选项， pandoc 会根据输出来决定使用哪一个默认的引擎：
-t latex or none: pdflatex (other options: xelatex, lualatex, tectonic, latexmk)
-t context: context
-t html: wkhtmltopdf (other options: prince, weasyprint)
-t ms: pdfroff
—toc, —table-of-contents
包含自动生成的 Toc（或者，latex, context, docx, odt, opendocument, rst, or ms, 情况下有指令需要生成）。这个选项必须配合 -s/--standalone 使用才有效，其在 man, docbook4, docbook5, jats 输出中无效。
如果我们使用 ms 来生成 PDF，TOC 会出现在文档标题的前面，我们可以用 --pdf-engine-opt==--no-toc-relocation 来让其在文档后面。
—toc-depth=NUMBER
指定要包含在 TOC 中的节等级。默认是3.
mactex
用 brew 已经找不到包了。所以我们可以安装 macTex，不过这玩意比较大。所以 pandoc 官方给了一个建议：
默认情况下 pandoc 使用LaTeX 来生成 PDF 。 因为完整的 MacTeX 会使用 4GB 的磁盘空间，我们建议使用 BasicTeX or TinyTeX 同时使用 tlmgr 来根据需要安装其他包. 如果我们收到警告说字体不存在，我们可以：
> tlmgr install collection-fontsrecommended
>
>
BasicTex
直接 brew 安装：
brew cask install basictex
安装后的目录在
/usr/local/texlive/2019basic
之后我们很多命令到能用了比如：pdflatex, xelatex, luatex 我们来试试。
先装两个依赖：
tlmgr  install titling lastpage
中文字体
使用命令 fc-list :lang=zh(fontconfig 包) 来查看有哪些中文字体:
System/Library/Assets/com_apple_MobileAsset_Font5/b2d7b382c0fbaa5777103242eb048983c40fb807.asset/AssetData/Kaiti.ttc: Kaiti TC,楷體\-繁,楷体\-繁:style=Bold,粗體,粗体
/System/Library/Assets/com_apple_MobileAsset_Font5/1183acef85eb1efe456a14378a2eb985c09768c9.asset/AssetData/Lantinghei.ttc: Lantinghei TC,蘭亭黑\-繁,兰亭黑\-繁:style=Extralight,纖黑,纤黑
/System/Library/Assets/com_apple_MobileAsset_Font5/940db29a0ab220999d9a1dbe3eb0819a718057b5.asset/AssetData/Libian.ttc: Libian SC,隸變\-簡,隶变\-简:style=Regular,標準體,常规体
/System/Library/Fonts/STHeiti Medium.ttc: Heiti SC,黑體\-簡,黒体\-簡,Heiti\-간체,黑体\-简:style=中黑,Medium,Halbfett,Normaali,Moyen,Medio,ミディアム,중간체,Médio,Средний,Normal,中等,Media
/System/Library/Assets/com_apple_MobileAsset_Font5/db09870736c6892b6a56035428f2b1b6d0a954fd.asset/AssetData/WawaTC-Regular.otf: Wawati TC,娃娃體\-繁,娃娃体\-繁:style=Regular,標準體,常规体
/System/Library/Assets/com_apple_MobileAsset_Font5/ce85149bd68e9f8b
使用示例
pandoc  年终总结.md -o srs.pdf --pdf-engine=xelatex -V CJKmainfont='Heiti SC'
我看网上大多的示例都是使用的是 mainfont 结果出错，非得用 CJKmainfont 才行，真是很坑
这是因为，网上使用的模板，与默认的模板不同，默认的模板位于 pandoc 目录下，比如我用 brew 安装的 pandoc 其模板位于：
/usr/local/Cellar/pandoc/2.8.1/share/x86_64-osx-ghc-8.8.1/pandoc-2.8.1/data/templates
下面，其中使用的就是 CJKmainfont 这个变量来设置字体的。
至此，如何将 md 转换为 pdf 就已经是完成了。但遗留的问题就是：
对于我 md 里面使用的 graphviz , mermaid 图表，如何才能给我在 PDF 中转换出来呢？
模板
当使用 -s/--standalone 选项的时候，pandoc 会在自表示的文档在中，在需要时使用一个模板来添加页眉和页脚。如果要查看默认的模板，键入：
pandoc -D *FORMAT*
FORMAT 输出文档的格式。
例如
pandoc -D latex
我们可以使用 --template 来指定一个自定义的模板，或者，我们可以在系统的目录中对默认模板进行替换（将文件 templates/default.*FORMAT*放在用户的数据目录（通过命令 pandoc --version来查看）。（关于系统默认模板的目录位置，我使用 brew 安装的话是位于：/usr/local/Cellar/pandoc/2.8.1/share/x86_64-osx-ghc-8.8.1/pandoc-2.8.1/data/templates 下面：）
但是有几个例外：
odt 自定义 default.opendocument 模板
pdf 自定义 defaut.latex 模板（或 在使用 -t context 时修改 default.context ，使用 ms 的时候自定义 default.ms，或在使用 -t html 的时候定义 -t html ）
docx pptx 没有模板。他们叫做参考文件。主要是参考 word 文件中的样式来进行设置格式。
模板会包含变量，我们可以通过命令行的 -V/--variable 来进行设置。如果一个变量没有设置，那么就会在文档的元数据内进行搜索，文档的元数据可以用 YAML 或者是 -M/--metadata 来设置。有些变量会被 pandoc 赋予默认值。我们可以在变量一节进行查看。
latex 模板语法
注释
以 $-- 开始的行都是注释
分隔符
在模板中，我们可以使用 或者 ${...} 作为分隔符来标识变量和控制结构。两种风格可以混用，但是开始和结尾必须一致。开始的分隔符可能会跟随空白符或 Tab，这些会被忽略。
PDF幻灯片
在转换 PDF 的时候加上 -t beamer 就行了。
变量
元数据变量
title, author, date 文档的基本标识信息。通过 LaTex 和 ConTeXt 来包含在 PDF 元数据中。可以通过 pandoc title block 或者一个 YAML 的元数据块来实现。
---
author:
- Aristotle
- Peter Abelard
...
注意：如果我们只是想设置 PDF 或 HTML 的元数据，我们可以不用在文档中包含这样的块，而只需要设置 title-meta, author-meta, 和 date-meta 变量就行了(默认情况下，这几个变量的默认值是通过 author, title, date 自动设置的)
subtitle HTML, EPUB, LaTeX, ConTeXt, and docx 中会用到的子标题
abstract LaTeX, ConTeXt, AsciiDoc, and docx 文档中的摘要
keywords HTML, PDF, ODT, pptx, docx and AsciiDoc 文档中的关键词
subject ODT, PDF, docx and pptx 中的科目
description ODT, docx and pptx metadata. 中的描述
category docx and pptx 文档分类
语言变量
lang 一个BCP 47 的语言标识
dir 文字方向。 rtl 或者是 ltr
Beamer slides 变量
这些变量改变一个使用 beamer 的 PDF 幻灯片的外观。
aspectratio 比例（43 for 4:3 [default], 169 for 16:9, 1610 for 16:10, 149 for 14:9, 141 for 1.41:1, 54 for 5:4, 32 for 3:2）
beamerarticle 从 Beamer slides 产生一个文章
beameroption 通过 \setbeameroption{} 来添加额外的 beamer 选项。
institute 作者附属信息；多个作者时可以是一个列表
logo幻灯片的LOGO
navigation 控制导航符号（没有导航符号就是空；其他值是 frame, vertical, horizontal）
section-titles 对新的节启用一个新的页。默认开启。
theme, colortheme, fonttheme, innertheme, outertheme主题
themeoptions LaTeX beamer themes 选项（一个列表）
titlegraphic 标题幻灯片的图片
latex 变量
布局
block-headings make \paragraph and \subparagraph (fourth- and fifth-level headings, or fifth- and sixth-level with book classes) free-standing rather than run-in; requires further formatting to distinguish from \subsubsection (third- or fourth-level headings). Instead of using this option, KOMA-Script can adjust headings more extensively:
---
documentclass: scrartcl
header-includes: |
  \RedeclareSectionCommand[
    beforeskip=-10pt plus -2pt minus -1pt,
    afterskip=1sp plus -1sp minus 1sp,
    font=\normalfont\itshape]{paragraph}
  \RedeclareSectionCommand[
    beforeskip=-10pt plus -2pt minus -1pt,
    afterskip=1sp plus -1sp minus 1sp,
    font=\normalfont\scshape,
    indent=0pt]{subparagraph}
...
classoption 文档类 class 选项。如：oneside，多个选项进行重复就行
---
classoption:
- twocolumn
- landscape
...
documentclass 通常是 book, article, report 之一；the KOMA-Script equivalents, scrartcl,
scrbook, and scrreprt, which default to smaller margins; or memoir
geometry geometry 包的选项。
---
geometry:
- top=30mm
- left=20mm
- heightrounded
...
hyperrefoptions hyperref 包的选项。如:linktoc=all
---
hyperrefoptions:
- linktoc=all
- pdfwindowui
- pdfpagemode=FullScreen
...
indent 使用文档类的缩进设置（默认的 LaTeX 模板会移除缩进，并在段落间添加空白）
linestretch 使用 [setspace](https://ctan.org/pkg/setspace) 报设置行间距。如：1.25，1.5
margin-left,margin-right,marigin-top,margin-bottom 如果没有使用的 geometry 的话，那么就使用这些设置。
pagestyle 控制 \pagestyle{}：默认的文章类支持 plain（默认），empty（没有页眉和页码）和 headings（在页眉有节标题）
papersize 如 letter, a4
secnumdepth 节的深度（需要传递 --number-sections/-N，或者设置 numbersections 变量）
字体
fontenc 通过 fontenc 包来指定字体的编码（pdflatex）；默认是 T1，参考LaTeX font encodings guide
fontfamily pdflatex 中要使用的字体包。TeX Live 包含了很多的选项, 文档参考 LaTeX Font Catalogue](https://tug.org/FontCatalogue/). 默认是 Latin Modern.
fontfamilyoptions 用做 fontfamily 的包的选项
---
fontfamily: libertinus
fontfamilyoptions:
- osf
- p
...
fontsize 字体正文大小。标准的值有：10pt, 11pt, 12pt。要使用其他尺寸，将 documentclass 设置成KOMA-Script 类，如scrartcl 或者 scrbook。
mainfont,sansfont,monofont,mathfont,CJKmainfont xelatex 和 lualatex 使用的字体：任何系统字体的名字，通过 fontspec 包来实现。CJKmainfont 使用 xecjk 包。
mainfontoptions, sansfontoptions, monofontoptions, mathfontoptions, CJKoptions 上述字体在的选项。
---
mainfont: TeX Gyre Pagella
mainfontoptions:
- Numbers=Lowercase
- Numbers=Proportional
...
microtypeoptions 传递给 microtype 包的选项。
链接
colorlinks 连接文本加色；如果 linkcolor, filecolor, citecolor, urlcolor, or toccolor 中任意一个被设置，那么这个变量会自动设置。
linkcolor, filecolor, citecolor, urlcolor, toccolor 也是链接颜色，不过针对的是内部链接、外部链接、引用链接、URLS、到TOC的链接。
links-as-notes 链接打印为脚注
扉页Front matter
lof, lot 包含图片和表格的列表
thanks 文档标题下的的一些脚注。
toc 目录。也可通过 --toc/--table-of-contents。
toc-depth 需要包含节的深度。
BibLaTeX Bibliographies
当用 BibLaTeX 来进行文献引用渲染时生效。
biblatexoptions biblatex 的选项列表
biblio-style bibliography 风格, when used with —natbib and —biblatex.
biblio-title bibliography 标题, when used with —natbib and —biblatex.
bibliography bibliography to use for resolving references
`natbiboptions list of options for natbib
raw attribute
这是一个扩展：raw_attribute
在一些代码块内，我们用特定的形式进行操作的话，那么将会被识别为 raw 内容。
```{=openxml}
<w:p>
  <w:r>
    <w:br w:type="page"/>
  </w:r>
</w:p>


上面的代码会在 docx 里面插入一个分页符。

openxml 必须和输出的格式一致。对应的关系如下：

- docx -> openxml
- opendocument -> odt
- html5 -> epub3
- html4 -> epub2
- latex/beamer/ms/html5 -> pdf （这依赖于我们使用的 --pdf-engine）

> RAW 属性不可与常规的属性混合使用



# 过滤器 Filter

Pandoc 提供了一个接口，用户可以用这个接口来编写程序（叫做过滤器）来在 pandoc 上的  AST （抽象语法树）进行操作。

Pandoc 由一系列的 读入器(Reader) 和写出器（Writer）组成。当我们将一个文档从一种格式转换为另外一种格式的时候，首先会由 pandoc 将输入文档转换为解析为 pandoc 的中间格式——abstract syntax tree（抽象语法树），然后由 Writer 来进行输出。AST 定义在 [`Text.Pandoc.Definition` in the `pandoc-types`package](https://hackage.haskell.org/package/pandoc-types/docs/Text-Pandoc-Definition.html). 模块中。

一个 Filter 就是一个修改  AST 的程序：
INPUT —reader—> AST —filter—> AST —writer—> OUTPUT

Filter 被看成是一个管道，其从标准输入读入，然后输出到标准输出。其会消耗，然后产生一个 pandoc 的 AST  JSON 表示。Filter 可以用任何的程序写成。我们只需要在命令行中指定过滤器就行：
pandoc -s input.txt —filter pandoc-citeproc -o output.htl

有 一些第三方的过滤器： [list of third party filters on the wiki](https://github.com/jgm/pandoc/wiki/Pandoc-Filters).



`
                     source format
                          ↓
                       (pandoc)
                          ↓
                  JSON-formatted AST
                          ↓
                       (filter)
                          ↓
                  JSON-formatted AST
                          ↓
                       (pandoc)
                          ↓
                    target format

如果我们要用 python 来编写 Filter 的话，可以使用  *pandocfilters* 这个包：

```sh
pip install pandocfilters

Total Pageviews

Monday, 1 October 2018

文档转换器-Pandoc

The universal markup converter

Installing

Documentation

Installing pandoc

Windows

macOS

Linux

BSD

Compiling from source

Getting the pandoc source code

Quick stack method

Quick cabal method

Custom cabal method

Creating a relocatable binary

Running tests

Running benchmarks

pandoc支持相互转换的格式, 多的惊人!

Pandoc主站链接

安装包下载地址,

https://github.com/jgm/pandoc/releases/download/3.1.2/pandoc-3.1.2-windows-x86_64.msi

使用pandoc转换pdf与docx加上书签目录

pandoc

抽象语法树

基本语法

建立 PDF

—pdf-engine

—toc, —table-of-contents

—toc-depth=NUMBER

mactex

BasicTex

中文字体

使用示例

模板

latex 模板语法

注释

分隔符

PDF幻灯片

变量

元数据变量

语言变量

Beamer slides 变量

latex 变量

布局

字体

链接

扉页Front matter

BibLaTeX Bibliographies

raw attribute

No comments:

Post a Comment