1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
|
# Appendix G: Regular expressions
Working with text is a pretty common and fundamental thing in day-to-day programming.
Lux's approach to doing it is with the use of composable, monadic text parsers.
The idea is that a parser is a function that takes some text input, performs some calculations which consume that input, and then returns some value, and (the remaining) unconsumed input.
Of course, the parser may fail, in which case the user should receive some meaningful error message to figure out what happened.
The `library/lux/control/parser/text` library provides a type, and a host of combinators, for building and working with text parsers.
```clojure
(type: .public Offset
Nat)
(type: .public Parser
(//.Parser [Offset Text]))
... And from library/lux/control/parser
(type: .public (Parser s a)
(-> s (Try [s a])))
```
A good example of text parsers being used is the `library/lux/data/format/json` module, which implements full JSON serialization.
---
However, programmers coming from other programming languages may be familiar with a different approach to text processing that has been very popular for a number of years now: regular expressions.
Regular expressions offer a short syntax to building text parsers that is great for writing quick text-processing tools.
Lux also offers support for this style in its `library/lux/data/text/regex` module, which offers the `regex` macro.
The `regex` macro, in turn, compiles the given syntax into a text parser, which means you can combine both approaches, for maximum flexibility.
Here are some examples of regular expressions:
```clojure
... Literals
(regex "a")
... Wildcards
(regex ".")
... Escaping
(regex "\.")
... Character classes
(regex "\d")
(regex "\p{Lower}")
(regex "[abc]")
(regex "[a-z]")
(regex "[a-zA-Z]")
(regex "[a-z&&[def]]")
... Negation
(regex "[^abc]")
(regex "[^a-z]")
(regex "[^a-zA-Z]")
(regex "[a-z&&[^bc]]")
(regex "[a-z&&[^m-p]]")
... Combinations
(regex "aa")
(regex "a?")
(regex "a*")
(regex "a+")
... Specific amounts
(regex "a{2}")
... At least
(regex "a{1,}")
... At most
(regex "a{,1}")
... Between
(regex "a{1,2}")
... Groups
(regex "a(.)c")
(regex "a(b+)c")
(regex "(\d{3})-(\d{3})-(\d{4})")
(regex "(\d{3})-(?:\d{3})-(\d{4})")
(regex "(?<code>\d{3})-\k<code>-(\d{4})")
(regex "(?<code>\d{3})-\k<code>-(\d{4})-\\0")
(regex "(\d{3})-((\d{3})-(\d{4}))")
... Alternation
(regex "a|b")
(regex "a(.)(.)|b(.)(.)")
```
Another awesome feature of the `regex` macro is that it will build fully type-safe code for you.
This is important because the groups and alternations that you use in your regular expression will affect the type of the `regex` expression.
For example:
```clojure
... This returns a single piece of text
(regex "a{1,}")
... But this one returns a pair of texts
... The first is the whole match: aXc
... And the second is the thing that got matched: the X itself
(regex "a(.)c")
... That means, these are the types of these regular-expressions:
(: (Parser Text)
(regex "a{1,}"))
(: (Parser [Text Text])
(regex "a(.)c"))
```
---
The benefits of parsers are that they are a bit easier to understand when reading them (due to their verbosity), and that they are very easy to combine (thanks to their monadic nature, and the combinator library).
The benefits of regular expressions are their familiarity to a lot of programmers, and how quick they are to write.
Ultimately, it makes the most sense to provide both mechanisms to Lux programmers, and let everyone choose whatever they find most useful.
|