In order to extract only text from an HTML website in the most robust way without using
Usage in terminal is:
If you want it to use inside of your
#python #html2text #github #html #text
regex
or urlib
or so, use the python library below:https://github.com/aaronsw/html2text
Usage in terminal is:
Usage: html2text.py [(filename|url) [encoding]]
If you want it to use inside of your
python
code:import html2text
print html2text.html2text("<p>Hello, world.</p>")
#python #html2text #github #html #text