Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic doc example no longer works #576

Open
dannykellett opened this issue Apr 30, 2024 · 3 comments
Open

Basic doc example no longer works #576

dannykellett opened this issue Apr 30, 2024 · 3 comments

Comments

@dannykellett
Copy link

dannykellett commented Apr 30, 2024

As doc here: https://requests-html.kennethreitz.org/

from requests_html import HTMLSession
def main() -> None:
    session = HTMLSession()
    r = session.get('https://python.org/')
    print(f"all links = {r.html.absolute_links}")

if __name__ == '__main__':
    main()

Traceback (most recent call last):
File "E:\11-Projects\learning_requests_html.py", line 1, in
from requests_html import HTMLSession
File "E:\11-Projects.venv\Lib\site-packages\requests_html.py", line 14, in
from lxml.html.clean import Cleaner
File "E:\11-Projects.venv\Lib\site-packages\lxml\html\clean.py", line 18, in
raise ImportError(
ImportError: lxml.html.clean module is now a separate project lxml_html_clean.
Install lxml[html_clean] or lxml_html_clean directly.

I guess I should mention that it worked after installing lxml but thought I should say the docs are not correct.

@jordanralba
Copy link

Ran into the same issue. Hopefully, they update their documentation shortly.

@e-ave
Copy link

e-ave commented Sep 24, 2024

How do i get it to work? I installed lxml_html_clean but r.html.render() still returns None because r is a Response object that doesnt have an html property

@e-ave
Copy link

e-ave commented Sep 24, 2024

Okay, I figured it out. But only if you downgrade to version 0.9.0. I still couldnt figure out 0.10.0 because everything returns requests objects instead of requests_html objects.

The readme says to do

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://python.org/')
rendered_html = r.html.render()

but session.get returns a requests.models.Response from the normal requests library, which doesn't have an html attribute. You actually need to call session.request instead of session.get. This function returns a requests_html.HTMLResponse, which is what we need.

from requests_html import HTMLSession
session = HTMLSession()
r = session.request(url='https://python.org/',method="GET")
rendered_html = r.html.render()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants