Skip to content

vom

GitHub contributors GitHub issues PRs Welcome HitCount

vom is a rewrite of nom, which is a parser combinator library. It is written in V, hence the name.

Example

Hexadecimal color parser:

import strconv
import vom { is_hex_digit, tag, take_while_m_n, tuple }

struct Color {
    red   byte
    green byte
    blue  byte
}

fn from_hex(input string) ?byte {
    return byte(strconv.parse_uint(input, 16, 8) ?)
}

fn hex_primary(input string) ?(string, string) {
    parser := take_while_m_n(2, 2, is_hex_digit)
    return parser(input)
}

fn hex_color(input string) ?(string, Color) {
    discard := tag('#')
    hex_part, _ := discard(input) ?
    parser := tuple(hex_primary, hex_primary, hex_primary)
    rest, output := parser(hex_part) ?
    red, green, blue := from_hex(output[0]) ?, from_hex(output[1]) ?, from_hex(output[2]) ?
    return rest, Color{red, green, blue}
}

fn main() {
    _, color := hex_color('#2F14DF') ?
    assert color == Color{47, 20, 223}
}

When will it reach 1.0?

There are some features I both need and want working in V before I will complete this library:

Generic return type for closures returned from functions

This is the only feature I absolutely need in order to finish this library. Without it, we're stuck with returning ?(string, string) instead of ?(string, T) from each parser, and thus can't construct an Ast with the library alone. That's currently something you need to do manually.

Generic type aliases

Currently this isn't implemented yet. Although it's not required in order to implement the features that are missing, it will make the codebase look horrible without because almost all of the functions depend on following:

type Fn = fn (string) ?(string, string)
type FnMany = fn (string) ?(string, []string)

And I need the last argument to be generic, because parsers could return other types such as int, token, []token etc. Although I could search and replace each entry manually, I'm too lazy to do that.

Functions that return closure that captures functions from function parameter

This is not a necessary issue either, but it would remove lots of boilerplate in the current code, for instance from sequence.v.

Adding this would turn following:

pub fn minimal(cond fn (int) bool) fn (int) bool {
    functions := [cond]
    return fn [functions] (input int) bool {
        cond := functions[0]
        return cond(input)
    }
}

Into this:

pub fn minimal(cond fn (int) bool) fn (int) bool {
    return fn [cond] (input int) bool {
        return cond(input)
    }
}

Call closure returned from function immediately

This is again not a mandatory feature for this library to work, but would be a nice addition. Instead of following code:

fn operator(input string, location Location) ?(string, Token) {
    parser := alt(tag('+'), tag('-'), tag('<'))
    rest, output := parser(input) ?
    return rest, Token{output, location, .operator}
}

We could write this instead, which is a very common pattern in nom:

fn operator(input string, location Location) ?(string, Token) {
    rest, output := alt(tag('+'), tag('-'), tag('<'))(input) ?
    return rest, Token{output, location, .operator}
}

Install

v install --git https://github.com/knarkzel/vom

Then import in your file like so:

import vom
rest, output := vom.digit1('123hello') ?
assert output == '123'
assert rest == 'hello'

There are examples in the examples/ folder.

Why use vom?

  • The parsers are small and easy to write
  • The parsers components are easy to reuse
  • The parsers components are easy to test separately
  • The parser combination code looks close to the grammar you would have written
  • You can build partial parsers, specific to the data you need at the moment, and ignore the rest

Resources

fn all_consuming #

fn all_consuming(f Fn) Fn

Succeeds if all the input has been consumed by its child parser.

fn alpha0 #

fn alpha0(input string) ?(string, string)

Recognizes zero or more lowercase and uppercase ASCII alphabetic characters: a-z, A-Z

fn alpha1 #

fn alpha1(input string) ?(string, string)

Recognizes one or more lowercase and uppercase ASCII alphabetic characters: a-z, A-Z

fn alphanumeric0 #

fn alphanumeric0(input string) ?(string, string)

Recognizes zero or more ASCII numerical and alphabetic characters: 0-9, a-z, A-Z

fn alphanumeric1 #

fn alphanumeric1(input string) ?(string, string)

Recognizes one or more ASCII numerical and alphabetic characters: 0-9, a-z, A-Z

fn alt #

fn alt(parsers ...Fn) Fn

Tests a list of parsers one by one until one succeeds.

fn character #

fn character(input string) ?(string, string)

Recognizes one letter.

fn cond #

fn cond(b bool, f Fn) Fn

Calls the parser if the condition is met.

fn count #

fn count(f Fn, count int) FnMany

Runs the embedded parser a specified number of times.

fn crlf #

fn crlf(input string) ?(string, string)

Recognizes the string '\r\n'.

fn delimited #

fn delimited(first Fn, second Fn, third Fn) Fn

Matches an object from the first parser and discards it, then gets an object from the second parser, and finally matches an object from the third parser and discards it.

fn digit0 #

fn digit0(input string) ?(string, string)

Recognizes zero or more ASCII numerical characters: 0-9

fn digit1 #

fn digit1(input string) ?(string, string)

Recognizes one or more ASCII numerical characters: 0-9

fn eof #

fn eof(input string) ?(string, string)

Returns its input if it is at the end of input data

fn fail #

fn fail(input string) ?(string, string)

A parser which always fails.

fn fill #

fn fill(f Fn, mut buf []string) FnMany

Runs the embedded parser repeatedly, filling the given slice with results.

fn hex_digit0 #

fn hex_digit0(input string) ?(string, string)

Recognizes zero or more ASCII hexadecimal numerical characters: 0-9, A-F, a-f

fn hex_digit1 #

fn hex_digit1(input string) ?(string, string)

Recognizes one or more ASCII hexadecimal numerical characters: 0-9, A-F, a-f

fn is_a #

fn is_a(pattern string) Fn

Returns the longest slice that matches any character in the pattern.

fn is_alphabetic #

fn is_alphabetic(b byte) bool

Tests if byte is ASCII alphabetic: A-Z, a-z.

fn is_alphanumeric #

fn is_alphanumeric(b byte) bool

Tests if byte is ASCII alphanumeric: A-Z, a-z, 0-9.

fn is_digit #

fn is_digit(b byte) bool

Tests if byte is ASCII digit: 0-9.

fn is_hex_digit #

fn is_hex_digit(b byte) bool

Tests if byte is ASCII hex digit: 0-9, A-F, a-f.

fn is_newline #

fn is_newline(b byte) bool

Tests if byte is ASCII newline: \n.

fn is_not #

fn is_not(pattern string) Fn

Parse till certain characters are met.

fn is_oct_digit #

fn is_oct_digit(b byte) bool

Tests if byte is ASCII octal digit: 0-7.

fn is_space #

fn is_space(b byte) bool

Tests if byte is ASCII space or tab.

fn line_ending #

fn line_ending(input string) ?(string, string)

Recognizes an end of line (both '\n' and '\r\n').

fn multispace0 #

fn multispace0(input string) ?(string, string)

Recognizes zero or more spaces, tabs, carriage returns and line feeds.

fn multispace1 #

fn multispace1(input string) ?(string, string)

Recognizes one or more spaces, tabs, carriage returns and line feeds.

fn newline #

fn newline(input string) ?(string, string)

Matches a newline character '\n'

fn none_of #

fn none_of(pattern string) Fn

Recognizes a character that is not in the provided characters.

fn not #

fn not(f Fn) Fn

Succeeds if the child parser returns an error.

fn not_line_ending #

fn not_line_ending(input string) ?(string, string)

Recognizes a string of any char except '\r\n' or '\n'.

fn oct_digit0 #

fn oct_digit0(input string) ?(string, string)

Recognizes zero or more octal characters: 0-7

fn oct_digit1 #

fn oct_digit1(input string) ?(string, string)

Recognizes one or more octal characters: 0-7

fn one_of #

fn one_of(pattern string) Fn

Recognizes one of the provided characters.

fn opt #

fn opt(f Fn) Fn

Optional parser: Will return '' if not successful.

fn peek #

fn peek(f Fn) Fn

Tries to apply its parser without consuming the input.

fn permutation #

fn permutation(parsers ...Fn) FnMany

Applies a list of parsers in any order.

fn preceded #

fn preceded(first Fn, second Fn) Fn

Matches an object from the first parser and discards it, then gets an object from the second parser.

fn recognize #

fn recognize(f Parser) Fn

If the child parser was successful, return the consumed input as produced value.

fn satisfy #

fn satisfy(condition fn (byte) bool) Fn

Recognizes one character and checks that it satisfies a predicate

fn separated_pair #

fn separated_pair(first Fn, sep Fn, second Fn) FnMany

Gets an object from the first parser, then matches an object from the sep_parser and discards it, then gets another object from the second parser.

fn space0 #

fn space0(input string) ?(string, string)

Recognizes zero or more spaces and tabs.

fn space1 #

fn space1(input string) ?(string, string)

Recognizes one or more spaces and tabs.

fn tab #

fn tab(input string) ?(string, string)

Matches a tab character '\t'.

fn tag #

fn tag(pattern string) Fn

Recognizes a pattern.

fn tag_no_case #

fn tag_no_case(pattern string) Fn

Recognizes a case insensitive pattern.

fn take #

fn take(count int) Fn

Returns an input slice containing the first N input elements (Input[..N]).

fn take_till #

fn take_till(condition fn (byte) bool) Fn

Returns the longest input slice (if any) till a predicate is met.

fn take_till1 #

fn take_till1(condition fn (byte) bool) Fn

Returns the longest (at least 1) input slice till a predicate is met.

fn take_until #

fn take_until(pattern string) Fn

Returns the input slice up to the first occurrence of the pattern.

fn take_until1 #

fn take_until1(pattern string) Fn

Returns the non empty input slice up to the first occurrence of the pattern.

fn take_while #

fn take_while(condition fn (byte) bool) Fn

Returns the longest input slice (if any) that matches the predicate.

fn take_while1 #

fn take_while1(condition fn (byte) bool) Fn

Returns the longest (at least 1) input slice that matches the predicate.

fn take_while_m_n #

fn take_while_m_n(m int, n int, condition fn (byte) bool) Fn

Returns the longest (m <= len <= n) input slice that matches the predicate.

fn terminated #

fn terminated(first Fn, second Fn) Fn

Gets an object from the first parser, then matches an object from the second parser and discards it.

fn tuple #

fn tuple(parsers ...Fn) FnMany

Applies a tuple of parsers one by one and returns their results as a tuple.

type Fn #

type Fn = fn (string) ?(string, string)

Parser which returns a single value

type FnMany #

type FnMany = fn (string) ?(string, []string)

Parser which returns many values

type Parser #

type Parser = Fn | FnMany

Core parser type