PyMarkdown is primarily a Markdown Linter. To ensure that the Markdown linting is accomplished successfully, the rules engine that powers the linter uses a Markdown parser that is both GitHub Flavored Markdown compliant and CommonMark compliant. The rules provided in the base application can be easily extended by writing new plugins and importing them into the rules engine through simple configuration options.
The PyMarkdown project has the following advantages:
- Consistency
- This project can example multiple files and directories with one invocation to ensure that all detected Markdown files adhere to the same set of guidelines.
- Portable
- The linter runs on any system running Python 3.8 or later with no modifications.
- Standardized
- As the parser that powers the linter is GitHub Flavored Markdown (GFM) compliant, it does not have to "guess" how some parsers may handle a given situation. It follows clears rules provided by the specification on how to parse the Markdown.
- Accurate
- The parser passes all GFM conformance tests and CommonMark conformance tests. If there was any doubt as to how a block of Markdown should be parsed, the CommonMark 0.29.2 release was used to determine the correct parsing.
- Flexible
- Each Markdown document is parsed into an internal token format, with the preference for writing any rules being to go against that format. Where that is not possible, simple regular expressions and algorithms are used on a line-by-line basis.
- Thoroughly tested
- The project currently has over 2700 scenario tests and coverage percentages all over 99%.
- Extensible
- The parser for the project adheres to the GFM specification and most of the rules for the parser leverage the tokens produced by that parser. The rules themselves are implemented as plugins, so they are extensible by default. The parser itself will be extended as needed to provide for other Markdown features as needed.
This project is currently in pre-release, and some of these documented things may not work 100% as advertised until the initial release.
This project required Python 3.8 or later to function.
pip install PyMarkdown
Full help support is available by entering
python main.py --help
on the command line and pressing enter. For any individual command,
help is available by following the command or commands with --help
as follows:
python main.py scan --help
These is section requires some examples to illustrate how things work.
For the purpose of this section, this documentation will assume that
there is a file called example-1.md
in a directory called /examples
that
has the following content:
## This is an example
Just an example.
and a file called example-2.md
in that same directory that has the
following content:
# This is an example
Just an example.
If you prefer, these files are already checked into the examples directory of the GitHub project.
The PyMarkdown project includes 13 out-of-the-box rules already implemented, with another 29 rules to be added before the 1.0.0 release. These rule are implemented using a simple plugin system that is documented in the developer documentation. It is these rules that allow the PyMarkdown project to examine or scan the various Markdown files, looking for particular patterns that the authors want to consistently enforce over a set of documents.
Note that the initial set of rules, the 42 rules provided by David Anson's Markdown Lint project were used as a starting point. This decision was made to give Markdown authors that use his project in their IDEs (such as the MarkdownLint plugin for VSCode that I use), a good grounding in what they can consistently check for.
The linter is executed by calling the project from the command line and
specifying one or more files and directories to scan for Markdown .md
files. The set of files and/or directories must be prefaced with the
scan
keyword to denote that scanning is required. For the examples
directory, both this form:
python main.py scan /examples
and this form:
python main.py scan /examples/example-1.md /examples/example-2.md
can be used to scan both file in the directory. The only difference
between the two invocations is that the first example will scan every
Markdown .md
file in the /examples
directory, while the second
invocation will only scan the two specified files. For clarity purposes,
if the command line specifies the same file multiple times, that file
name will only be added to the list of files to scan once.
Executing either of the above examples will produce the following output:
/examples/example-1.md:3:16: MD047: Each file should end with a single
newline character. (single-trailing-newline)
The format of the output for any rules that are triggered is as follows:
file-name:line:column: rule-id: description (aliases)
file-name
- Path to the file that triggered the rule.line
/column
- Position in the file where the rule was triggeredrule-id
- Unique identifier assigned to the rule.description
- Human readable description of the rule.aliases
- One or more aliases used to reference the rule.
For this example and the rule violation that was reported, the first
step is to look at the file /examples/example-1.md
at the end of
line 3, which is column 16. Rule md047 specifies
that every file should end with a single newline character, which is
what is reported in the description. Additionally, it reports that this
rule can also be identified by the more human readable alias of
single-trailing-newline
.
For more advanced scanning options, please consult the document on Advanced Scanning.
For information on what rules are currently present, the following command may be used:
python main.py plugins list
This command will list all of the rules in a table in the following format:
rule-id aliases enabled-default enabled-current version
rule-id
- Unique identifier assigned to the rule.aliases
- One or more aliases used to reference the rule.enabled-default
- Whether the rule is enabled by default.enabled-current
- Whether the rule is currently enabled.version
- Version associated with the rule. If the rule is a project rule, this version will always be the version of the project.
In addition, the list
command may be followed by text that
specifies a Glob pattern used to match against the rules. For example,
using the command plugins list md00?
produced this output:
ID NAMES ENABLED (DEFAULT) ENABLED (CURRENT) VERSION
md047 first-heading-h1, first False False 0.5.0
-header-h1
If more verbose information is needed on a given rule, the
plugins info
command may be used with a specific rule-id
or alias
used to refer to the plugin. If provided with a rule-id
of md047
or single-trailing-newline
, this command will produce
the following output:
Id:md047
Name(s):single-trailing-newline
Description:Each file should end with a single newline character.
- Note that better support for this command is priortized as required for the general release and should happen fairly quickly
The most frequently used part of the configuration system is the part that enables and disables specific rules while scanning the Markdown files. For example, if you do not like rule md047 which states that each file must end with a single newline, you can disable that rule by specifying:
python main.py -d md047 scan /examples
or:
python main.py --disable-rules md047 scan /examples
The effect of disabling the rule should be evidenced by
the scan no longer reporting any violations of rule md047
against the Markdown file example-1.md
.
Alternatively, rules can also be enabled. As the modelled base rules for this project are based off of those for David Anson's project, rule md002 is disable by default in both projects. Specifically, rule md002 is disabled by default as rule md041 provides a better implementation of that rule that takes front-matter into account. Until that rule is implemented, you can enable rule md002 by specifying either:
python main.py -e md002 scan /examples
or
python main.py --enable-rules md002 scan /examples
The effect of enabling the rule should be evidenced by
the scan reporting a violation of rule md002 against
Markdown file example-1.md
:
examples/example-1.md:1:1: MD002: First heading of the document should
be a top level heading. [Expected: h1; Actual: h2] (first-heading-h1,
first-header-h1)
examples/example-1.md:3:16: MD047: Each file should end with a single
newline character. (single-trailing-newline)
For more advanced configuration options, please consult the document on Advanced Configuration. This document includes information on:
- specifying configuration files
During the development phase of this project, it was more useful to have an actual list of issues to track and prioritize, rather than relying on GitHub to do all of the work. Here is the Issues List.
If you find any issues, please report them using the standard GitHub issues process. When our team takes a look at your issue and triages it, it will be added to our Issues List with the specified priority. For us, this provides transparency as to what we are currently working on, what is up next, and what our future plans are.
If you still have questions, please consult our Frequently Asked Questions document.
The changelog for this project is maintained at this location.
If you would like to report an issue with the linter, a rule, or the documentation, please file an issue using GitHub.
If you would like to help fix a specific issue or do some work to implement a feature that you believe is important, please file an issue that includes what you want to add, why you want to add it, and why it is important.
If you would like to contribute to the project in a more
substantial manner, please contact me at jack.de.winter@outlook.com
.
See CONTRIBUTING.md file.
Currently, as a team of one, there are only two big groups of people to acknowledge.
The first, and foremost group, is my immediate family. They have endured me coming out of my office with my head still in the clouds, explaining things to them so that I can think more clearly. While they still do not understand what I am talking about with respect to this project, I am so grateful to them for allowing me to work "my process" to figure things out.
The second group is the contributors to the CommonMark discussion forum. While I have raised some issues that were cut and dry, a lot of them involved significant amount of discussion to figure out what the right approach is. Through all those discussions, I rarely, if ever, felt like they treated me as less than equal, no matter how stupid my questions was. For their patience and their professionalism, thank you.