6.3. Url¶
6.3.1. Url¶
-
class
easydata.parsers.url.Url(*args, from_text: bool = False, from_qs: Optional[str] = None, from_qs_unquote: Optional[str] = None, remove_qs: Optional[Union[str, list, bool]] = None, qs: Optional[dict] = None, domain: Optional[str] = None, protocol: Optional[str] = None, normalize: bool = True, **kwargs)[source]¶ Bases:
easydata.parsers.text.Text
Url parser is based upon Text parser and therefore inherits all parameters
from it and it’s usage. One differences is that normalize parameter is set to
False while in Text parser is set to True by default.
To read docs regarding other parameters than the one described here, please go to Text documentation.
Getting Started¶
>>> test_dict = {'url': 'demo.com/home'}
>>> ed.Url(ed.jp('url')).parse(test_dict)
https://demo.com/home
In this case we see that url in a test_dict is partial. Url parser will try
to construct and output always full urls.
Parameters¶
-
qs¶
With qs parameter we can manipulate urls query strings. We can change existing
ones or add new ones.
Lets first try to change existing one.
>>> ed.Url(qs={'home': 'false'}).parse('https://demo.com/?home=true')
'https://demo.com/?home=false'
Now lets try to change existing one and at the same time add a new query string value.
>>> test_url = 'https://demo.com/?home=true'
>>> ed.Url(qs={'home': 'false', 'country': 'SI'}).parse(test_url)
'https://demo.com/?home=false&country=SI'
-
remove_qs¶
With remove_qs we can remove query string keys and it’s values.
If we provide to remove_qs a str key, then only a single query string key
and value will be removed as we can see bellow.
>>> ed.Url(remove_qs='home').parse('https://demo.com/?home=false&country=SI')
'https://demo.com/?country=SI'
We can also delete multiple query string keys and it’s values at the same time
by providing a list of str keys to a remove_qs parameter.
>>> test_url = 'https://demo.com/?home=false&country=SI¤cy=EUR'
>>> ed.Url(remove_qs=['home', 'country']).parse(test_url)
'https://demo.com/?currency=EUR'
If we set remove_qs to True then all query string keys and values
will be removed.
>>> ed.Url(remove_qs=True).parse('https://demo.com/?home=false&country=SI')
'https://demo.com/'
-
from_text¶
Url parser has ability to extract url from a text as we can see in example
bellow.
>>> ed.Url(from_text=True).parse('Home url is: https://demo.com/home !!!')
'https://demo.com/home'
-
domain¶
In some cases we can get only partial url links without a domain, especially
when we are scraping websites and for cases like this setting domain parameter
with a domain name will help with full url link construction.
>>> ed.Url(domain='http://demo.com').parse('/product/1122')
'http://demo.com/product/1122'
domain parameter value can also be provided without a protocol like http or
https. If that’s the case then a default protocol https will be used in
order to construct full url.
>>> ed.Url(domain='demo.com').parse('/product/1122')
'https://demo.com/product/1122'
Note
Default value of domain parameter can be defined through a config variable ED_URL_DOMAIN in a model.
-
protocol¶
As we saw in example above, default protocol https is used when provided domain
name in domain parameter has a missing protocol. We can change our default
protocol value https by specifying new value into protocol parameter.
>>> ed.Url(domain='demo.com', protocol='ftp').parse('/product/1122')
'ftp://demo.com/product/1122'
Note
Default value of protocol parameter can be defined through a config variable ED_URL_PROTOCOL in a config file or a model.
-
from_qs¶
-
from_qs_unquote¶