8.6. Clause Parsers¶
8.6.1. Or¶
Example:
Lets import our easydata module first.
>>> import easydata as ed
Lets write our Or
parser.
test_html = '''
<p class="brand">EasyData</p>
'''
or_parser = ed.Or(
ed.Text(ed.pq('.brand-wrong-selector::text')),
ed.Text(ed.pq('.brand::text'))
)
Now lets parse test_html
data and print our result.
print(or_parser.parse(test_html))
'EasyData'
First our parsers.Text(pq('.brand-wrong-selector::text')),
output was None
,
while next Text
parser in line has produced output, since it’s selector was able
to extract data from HTML
.
Please note that even if query selector found a match and it’s content was still
None
, then data from the next parser in line would be tried to be parsed.
Another example:
test_html = '''
<p class="brand">EasyData</p>
<p id="name">Easybook Pro 13</p>
'''
or_parser = ed.Or(
ed.Text(ed.pq('#name::text')),
ed.Text(ed.pq('.brand::text'))
)
Now lets parse test_html
data and print our result.
print(or_parser.parse(test_html))
'Easybook Pro 13'
In this case, data parsed in a first parser was returned, since it’s css selector
was able to find this time data and because of that, Text
parser returned a
value. All other parsers further down the line are ignored when first match is found.
8.6.2. With¶
Example:
Lets import our easydata module
>>> import easydata as ed
Lets write our With
parser.
test_html = '''
<div id="description">
<ul class="features">
<li>Material: aluminium <span>MATERIAL</span></li>
<li>style: <strong>elegant</strong> is this</li>
<li>Date added: Fri, 12 Dec 2018 10:55</li>
</ul>
</div>
'''
with_parser = ed.With(
ed.Sentences(
ed.pq('#description .features::text'),
allow=['date added']
),
ed.DateTimeSearch()
)
Now lets parse test_html
data and print our result.
print(with_parser.parse(test_html))
'12/12/2018 10:55:00'
8.6.3. ConcatText¶
ConcatText
will combine string values of two or more parsers.
Example:
>>> import easydata as ed
Lets write our ConcatText
parser.
test_html = '''
<p class="brand">EasyData</p>
<p id="name">Easybook Pro 13</p>
'''
concat_text_parser = ed.ConcatText(
ed.Text(ed.pq('#name::text')),
ed.Text(ed.pq('.brand::text'))
)
Now lets parse test_html
data and print our result.
print(concat_text_parser.parse(test_html))
'EasyData Easybook Pro 13'
8.6.4. JoinList¶
JoinList
is similar to JoinText
but instead of joining two str
together, it will join two list
types together.
Example:
>>> import easydata as ed
Lets write our JoinList
parser.
test_dict = {
'features': [
'gold color',
'retina'
],
'specs': [
'i7 proc',
'16 gb'
]
}
join_list_parser = ed.JoinList(
ed.List(
ed.jp('features'),
parser=parsers.Text()
),
ed.List(
ed.jp('specs'),
parser=parsers.Text()
),
)
Now lets parse test_dict
data and print our result.
print(join_list_parser.parse(test_dict))
['gold color', 'retina', 'i7 proc', '16 gb']
8.6.5. MergeDict¶
MergeDict
is similar to JoinList
but instead of joining two list
types together, it will merge two dict
types together.
Example:
>>> import easydata as ed
Lets write our MergeDict
parser.
test_dict = {
'features': {
'color': 'gold',
'display': 'retina'
},
'specs': {
'proc': 'i7',
'ram': '16 gb'
}
}
merge_dict_parser = ed.MergeDict(
ed.Dict(
ed.jp('features'),
key_parser=ed.Text(),
value_parser=ed.Text()
),
ed.Dict(
ed.jp('specs'),
key_parser=ed.Text(),
value_parser=ed.Text()
),
)
Now lets parse test_dict
data and print our result.
print(merge_dict_parser.parse(test_dict))
{'color': 'gold', 'display': 'retina', 'proc': 'i7', 'ram': '16 gb'}
8.6.6. ItemDict¶
examples coming soon …
8.6.7. ItemList¶
examples coming soon …