3. Architecture¶
easydata
consist of several components:
model
block models
parsers
queries
data processors
item processors
data bag
Each of the easydata
components can be used independently to process data, which
makes writing tests significantly easier. This is is easier because there is not a need
to utilize mocks in testing.
Most important component is a model.
The Model component glues all of the other components together, parses data, and outputs an item dictionary.
First let create some variables, which will hold different kinds of data, and this will be
passed to a parse_item
method later in this tutorial.
>> json_text = '{"price": 999.90}'
>> html_text = '<p class="sale_price">499.9<p>'
First we will create a simple ItemModel
and explain how the data is passed and processed
through other components.
import easydata as ed
class ProductItemModel(ed.ItemModel):
data_processors = [
ed.DataJsonToDictProcessor()
]
item_price = ed.PriceFloat(ed.jp('price'))
_item_sale_price = ed.PriceFloat(
ed.pq('sale_price::text'),
source='html'
)
item_processors = [
ed.ItemDiscountProcessor()
]
>> product_model = ProductItemModel()
When we initialize our ProductItemModel
nothing happens. Initialization of processors
and other core components is done after we call parse_item
method for a first time.
Note
The design decision to not use __init__
for class initialization is in order to add
ItemModel
as a mixin to your existing class.
Now lets pass our variables with different types of data to the parse_item
method.
>> product_model.parse_item(data=json_text, html=html_text)
{'price': 999.9, 'discount': 50.01}
Note
In the result, we are missing the sale_price
key in our dictionary. This is intentional
since all properties that start with _item
will be deleted before the final output.
When we pass our json_text
and html_text
to the parse_item
method, our model will get registered
with model manager which is in charge of handling are components specified in our model. The model
is registered within model manager only when we call the parse_item
method for the first time. In the next
step, the model manager is passed through the json_text
and html_text
data will be stored
into a DataBag
object dictionary under data
and html
keys respectively. All parsers
and processors will by default look in a DataBag
for a data
key, unless specified
otherwise in a processor or a parser. We can see in our example model above, that a PriceFloat
parser for a _item_sale_price
property has a value html
in it’s source
parameter
… this means that under the hood parser will try to extract data from html
key in our
DataBag
dictionary rather than default data
key. Similar principles apply also for
data processors.
Note
DataBag
is a dictionary based object, which is used through all parsing cycle in
a model. All other components (except item_processors
) have access to it in
order to extract, create, modify or delete data in a DataBag
dictionary.
When DataBag
is created under the hood on a parse_item
call, it will be passed
first through data processors, where it will be modified or transformed and in next
step will be passed further to item parsers. In item parsers, data will be extracted from
a DataBag
and it’s values are stored in a item dictionary.
Before the final output, the item dictionary will get passed through item_processors
, and, if needed,
the item dictionary keys or values will be modified.
3.1. Next steps¶
To get a better understanding regarding processors and many other components, please proceed further to the Advanced section.