8.6. Clause Parsers¶

8.6.1. Or¶

Example:

Lets import our easydata module first.

>>> import easydata as ed

Lets write our Or parser.

test_html = '''
    <p class="brand">EasyData</p>
'''

or_parser = ed.Or(
    ed.Text(ed.pq('.brand-wrong-selector::text')),
    ed.Text(ed.pq('.brand::text'))
)

Now lets parse test_html data and print our result.

print(or_parser.parse(test_html))

'EasyData'

First our parsers.Text(pq('.brand-wrong-selector::text')), output was None, while next Text parser in line has produced output, since it’s selector was able to extract data from HTML.

Please note that even if query selector found a match and it’s content was still None, then data from the next parser in line would be tried to be parsed.

Another example:

test_html = '''
    <p class="brand">EasyData</p>
    <p id="name">Easybook Pro 13</p>
'''

or_parser = ed.Or(
    ed.Text(ed.pq('#name::text')),
    ed.Text(ed.pq('.brand::text'))
)

Now lets parse test_html data and print our result.

print(or_parser.parse(test_html))

'Easybook Pro 13'

In this case, data parsed in a first parser was returned, since it’s css selector was able to find this time data and because of that, Text parser returned a value. All other parsers further down the line are ignored when first match is found.

8.6.2. With¶

Example:

Lets import our easydata module

>>> import easydata as ed

Lets write our With parser.

test_html = '''
    <div id="description">
        <ul class="features">
            <li>Material: aluminium <span>MATERIAL</span></li>
            <li>style: <strong>elegant</strong> is this</li>
            <li>Date added: Fri, 12 Dec 2018 10:55</li>
        </ul>
    </div>
'''

with_parser = ed.With(
    ed.Sentences(
        ed.pq('#description .features::text'),
        allow=['date added']
    ),
    ed.DateTimeSearch()
)

Now lets parse test_html data and print our result.

print(with_parser.parse(test_html))

'12/12/2018 10:55:00'

8.6.3. ConcatText¶

ConcatText will combine string values of two or more parsers.

Example:

>>> import easydata as ed

Lets write our ConcatText parser.

test_html = '''
    <p class="brand">EasyData</p>
    <p id="name">Easybook Pro 13</p>
'''

concat_text_parser = ed.ConcatText(
    ed.Text(ed.pq('#name::text')),
    ed.Text(ed.pq('.brand::text'))
)

Now lets parse test_html data and print our result.

print(concat_text_parser.parse(test_html))

'EasyData Easybook Pro 13'

8.6.4. JoinList¶

JoinList is similar to JoinText but instead of joining two str together, it will join two list types together.

Example:

>>> import easydata as ed

Lets write our JoinList parser.

test_dict = {
    'features': [
        'gold color',
        'retina'
    ],
    'specs': [
        'i7 proc',
        '16 gb'
    ]
}

join_list_parser = ed.JoinList(
    ed.List(
        ed.jp('features'),
        parser=parsers.Text()
    ),
    ed.List(
        ed.jp('specs'),
        parser=parsers.Text()
    ),
)

Now lets parse test_dict data and print our result.

print(join_list_parser.parse(test_dict))

['gold color', 'retina', 'i7 proc', '16 gb']

8.6.5. MergeDict¶

MergeDict is similar to JoinList but instead of joining two list types together, it will merge two dict types together.

Example:

>>> import easydata as ed

Lets write our MergeDict parser.

test_dict = {
    'features': {
        'color': 'gold',
        'display': 'retina'
    },
    'specs': {
        'proc': 'i7',
        'ram': '16 gb'
    }
}

merge_dict_parser = ed.MergeDict(
    ed.Dict(
        ed.jp('features'),
        key_parser=ed.Text(),
        value_parser=ed.Text()
    ),
    ed.Dict(
        ed.jp('specs'),
        key_parser=ed.Text(),
        value_parser=ed.Text()
    ),
)

Now lets parse test_dict data and print our result.

print(merge_dict_parser.parse(test_dict))

{'color': 'gold', 'display': 'retina', 'proc': 'i7', 'ram': '16 gb'}

8.6.6. ItemDict¶

examples coming soon …

8.6.7. ItemList¶

examples coming soon …