LXML is a nice little document parser for lightweight and effective HTML/XML parsing without using regular expressions. The module can be installed with relative ease using pip and works for Python 2 and 3. Let’s get the token and expire form values from NYTimes site for an example.
Installation of LXML
# Install lxml using pip3 pip3 install lxml # Verify it pip3 list
Using LXML
# Import LXML parser import lxml.html import requests # Use requests library to get the URL htmlstr = requests.get('https://myaccount.nytimes.com/auth/login/?URI=http://www.nytimes.com/2014/09/13/opinion/on-long-island-a-worthy-plan-for-coastal-flooding.html?partner=rss') # Create an HTML tree htmltree = lxml.html.document_fromstring(htmlstr.content) # Use XPath to get Token value for input_el in htmltree.xpath("//input[@name='token']/@value"): token_val = input_el # Use XPath to get Expires value for input_el_2 in htmltree.xpath("//input[@name='expires']/@value"): expires_val = input_el_2 # Printing it all out print (token_val) print (expires_val)
Result
If all went well, you should see something like this on your terminal:
0f5d2c48c813aeaaccf1bc3e68fbda53dd691bca99fc8d27e864b041e534cc9f1c8a837cab3f9e70a5fc1852097f23ecd67cc58b29a2b654ea7b925e91b0addf4726ed43bbe82baf6e8c0f179a2198362fa55dc724cebb9f41f794bee6ec767410aafdfba9495716e059d649ee2c68edc82131f1f5b08681024d881fe38920c7ea8ca44c4b4a190122718f2123238b76d758825d422aeda868942f0d17c331d157e2130e58c97d61a5aa24399b88bcedfa910000c68fd66415f96aea74f44731a1e8c92cadb747bc77bdeacdbc943fa483aa1708617400ee2255f63f6a768f5d701444db2fa484928719c52bb943a5264ec96175e9f06572717343282f89d9de 1414572834
About Ali Gajani
Hi. I am Ali Gajani. I started Mr. Geek in early 2012 as a result of my growing enthusiasm and passion for technology. I love sharing my knowledge and helping out the community by creating useful, engaging and compelling content. If you want to write for Mr. Geek, just PM me on my Facebook profile.