6.3. Url¶
6.3.1. Url¶
-
class
easydata.parsers.url.
Url
(*args, from_text: bool = False, from_qs: Optional[str] = None, from_qs_unquote: Optional[str] = None, remove_qs: Optional[Union[str, list, bool]] = None, qs: Optional[dict] = None, domain: Optional[str] = None, protocol: Optional[str] = None, normalize: bool = True, **kwargs)[source]¶ Bases:
easydata.parsers.text.Text
Url
parser is based upon Text
parser and therefore inherits all parameters
from it and it’s usage. One differences is that normalize
parameter is set to
False
while in Text
parser is set to True
by default.
To read docs regarding other parameters than the one described here, please go to Text documentation.
Getting Started¶
>>> test_dict = {'url': 'demo.com/home'}
>>> ed.Url(ed.jp('url')).parse(test_dict)
https://demo.com/home
In this case we see that url in a test_dict is partial. Url
parser will try
to construct and output always full urls.
Parameters¶
-
qs
¶
With qs
parameter we can manipulate urls query strings. We can change existing
ones or add new ones.
Lets first try to change existing one.
>>> ed.Url(qs={'home': 'false'}).parse('https://demo.com/?home=true')
'https://demo.com/?home=false'
Now lets try to change existing one and at the same time add a new query string value.
>>> test_url = 'https://demo.com/?home=true'
>>> ed.Url(qs={'home': 'false', 'country': 'SI'}).parse(test_url)
'https://demo.com/?home=false&country=SI'
-
remove_qs
¶
With remove_qs
we can remove query string keys and it’s values.
If we provide to remove_qs
a str key, then only a single query string key
and value will be removed as we can see bellow.
>>> ed.Url(remove_qs='home').parse('https://demo.com/?home=false&country=SI')
'https://demo.com/?country=SI'
We can also delete multiple query string keys and it’s values at the same time
by providing a list
of str
keys to a remove_qs
parameter.
>>> test_url = 'https://demo.com/?home=false&country=SI¤cy=EUR'
>>> ed.Url(remove_qs=['home', 'country']).parse(test_url)
'https://demo.com/?currency=EUR'
If we set remove_qs
to True
then all query string keys and values
will be removed.
>>> ed.Url(remove_qs=True).parse('https://demo.com/?home=false&country=SI')
'https://demo.com/'
-
from_text
¶
Url
parser has ability to extract url from a text as we can see in example
bellow.
>>> ed.Url(from_text=True).parse('Home url is: https://demo.com/home !!!')
'https://demo.com/home'
-
domain
¶
In some cases we can get only partial url links without a domain, especially
when we are scraping websites and for cases like this setting domain
parameter
with a domain name will help with full url link construction.
>>> ed.Url(domain='http://demo.com').parse('/product/1122')
'http://demo.com/product/1122'
domain parameter value can also be provided without a protocol like http
or
https
. If that’s the case then a default protocol https
will be used in
order to construct full url.
>>> ed.Url(domain='demo.com').parse('/product/1122')
'https://demo.com/product/1122'
Note
Default value of domain parameter can be defined through a config variable ED_URL_DOMAIN in a model.
-
protocol
¶
As we saw in example above, default protocol https
is used when provided domain
name in domain
parameter has a missing protocol. We can change our default
protocol value https
by specifying new value into protocol parameter.
>>> ed.Url(domain='demo.com', protocol='ftp').parse('/product/1122')
'ftp://demo.com/product/1122'
Note
Default value of protocol parameter can be defined through a config variable ED_URL_PROTOCOL in a config file or a model.
-
from_qs
¶
-
from_qs_unquote
¶