I am currently writing an Android app that, among other things, uses text information from websites which I do not own. In addition, some of the pages require authentification.
For some pages I have been able to log in and retrieve the html code using BasicNameValuePairs and an HTTPClient with its associated objects.
I've done my research, but everything I've found is guesswork & extremely confusing. I'm okay with ignoring pages that require login for now. Also, I am willing to post any code that may be useful for constructing a solution; It is an independent project.
The aforementioned solutions are very slow and restrict you to 1 url (well, not really, but I dare you to scrape 10 urls with Rhino while your user is impatiently waiting for results).
An alternative is to use a cloud scraping solution. You get the benefit of not wasting phone bandwidth on downloading content you won't use.
Try this solution: Bobik Java SDK
It gives you the ability to scrape up to hundreds of sites in a matter of seconds
Other Things I Tried:
Things That Might Work:
Further results will be posted. Others results will be added if posted.
Note: many of the options listed above reference each other. I think rhino is included in both sl4a and htmlunit. Also, I think htmlunit contains selenium.
©2020 All rights reserved.