Web Scraping - List Index out of range Error

by: nee_609, 7 years ago


Hi there,

I am working on a web scraping project and using chrome web driver. For about 90% of times, the code works fine. But every now and then it throws the "list index out of range" error. With some research, i figured that it may have something to do with giving the browser to complete the action and then come up with new sourcecode before i read it using beautifulsoup, so i added some time.sleep and wherever i already had it, i increased the time little bit. That helped me to certain extent.

Is there any other way I can do this?

Thanks in Advance!

P.S:Posting this item here because further i am using NLTP to perform the processing of the text.



You must be logged in to post. Please login or register an account.



List index out of range simply means you attempted to reference a list item, by index, and that index simply doesn't exist. It could indeed be the case that you're attempting to parse something by index, and that index simply doesn't exist yet since it hasn't been populated with something like javascript. There's really no way to handle for that, if that is indeed the case, but that's what it sounds like. you just need to have the wait time. You could also just loop it a few times with a statement that checks to see if that index exists yet. Most pages should load really quick, while others could take a while.

-Harrison 7 years ago

You must be logged in to post. Please login or register an account.


Thanks Harrison, appreciate the quick response. I think wait time is THE only solution as i know the tags are there. And how i know is that once the process fails, the chrome is still open with the url so i just do the inspect and i can see it there. I guess i just need to give it enough time to rebuild the sourcecode before starting to parse it.

Great site BTW, really enjoyed the NLTP tutorials.

-nee_609 7 years ago

You must be logged in to post. Please login or register an account.

Hi! Newbie here. I have the same issue: My code works well but stops randomly throwing the same error. Can you please share your code so I see where to add the time.sleep? I also know it is not an isue with the list of webpages, as it stops randomly when I run the code again, sometimes correctly fetching a page where I previously received an error.

Thanks!

Alex

-alexheca 3 years ago

You must be logged in to post. Please login or register an account.