I have seen this question posted before, but my situation is somewhat different so I was hoping I could get some help from the community and maybe a fresh perspective. I have a macro written in vba that's supposed to pull return data from this company's online db, things like returns for MSCI World Index, S&P 500, etc. The code I have works on other pages, but I think this one is different. I talked to the webmaster and he told me that the code was not designed to be scraped, but that is not restricted by their usage policy. It would be a huge time saver for me if I could, indeed, get the data by scraping so I'm trying really hard to figure out a way to do it. I've tagged this under java-script as well, as I think the code would be very similar and I want to accept as many solutions as possible to solve this problem.
The situation is this: I have the following code that throws me an "Object variable not set" error when it comes to the actual scraping of the data (the line that begins 'set els = htmlDoc...." I've tried many combinations of the getElement(s) function thinking that may have been the problem, but I've drawn a blank. Anybody know any other ways to set the object variable in this environment? Or just any other creative ways to pull the data.
I can't give out the login info, but I think by just navigating to the 'caRetPage' site, you can see the html code that I'm trying to scrape/parse.
Sub caScrape() Dim ie As Object 'ie: internet explorer Dim htmlDoc As MSHTML.HTMLDocument Dim els As Object 'to store html objects Dim rtn As String 'to store values to be scraped from page Dim loginButton As Object caLoginPage = "https://members.cambridgeassociates.com/Login/Forms/login-form.asp" caRetPage = "https://members.cambridgeassociates.com/markets/marketindexsnapshot/DailyMarketReturnsUS.asp" caUser = "xxxxx" caPass = "xxxxx" Set ie = CreateObject("internetexplorer.application") ie.Visible = True ie.navigate caLoginPage While ie.Busy DoEvents Wend Do Until ie.readyState = 4 DoEvents Loop Set htmlDoc = ie.document 'Log in to site Set loginButton = htmlDoc.getElementsByTagName("button").Item(0) With htmlDoc .all("Username").Value = caUser .all("Password").Value = caPass loginButton.Click End With While ie.Busy DoEvents Wend Set acceptButton = htmlDoc.getElementsByName("Submit").Item(0) acceptButton.Click While ie.Busy DoEvents Wend 'Here is the page with the return data on it ie.navigate caRetPage While ie.Busy DoEvents Wend Do Until ie.readyState = 4 DoEvents Loop Set htmlDoc = ie.document 'This next line is where the error gets thrown Set els = htmlDoc.getElementById("tblData")(0).getElementByTagName("tr")(5).getElementByTagName("td")(1) 'Also tried the following and plenty of variations of getElement command 'Set els = htmlDoc.getElementsByTagName("body")(0).getElementsByTagName("table")(2).getElementsByTagName("tbody")(0).getElementByTagName("tr")(5).getElementByTagName("td")(1) rtn = els.innerText Debug.Print(rtn) End Sub
Any help would be greatly appreciated.
I believe I have a working example with the URL provided in the code posted above. From what I can tell there are frames on the website, so you need to handle these slightly differently.
In addition, waiting for the page to load needed a different method. For that I reused some code I got some another StackOverflow answer. Basically it waits until it doesn't find anymore tags on a page as it is loading.
Here's the code:
#If VBA7 Then Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr) #Else Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long) #End If Sub getData() Dim element As Object Dim MyURL As String MyURL = "https://members.cambridgeassociates.com/markets/marketindexsnapshot/DailyMarketReturnsUS.asp" 'Late binding open IE Dim MyBrowser As Object: Set MyBrowser = CreateObject("InternetExplorer.Application") MyBrowser.Visible = True MyBrowser.navigate MyURL waitforload MyBrowser Set element = MyBrowser.document.getelementsByTagName("Frameset")(0).Children(1).contentdocument Set element = element.getelementByID("tblData") Set element = element.getelementsByTagName("tr")(5) Set element = element.getelementsByTagName("td")(1) Debug.Print element.innertext End Sub Private Sub waitforload(ByRef ie As Object) Dim i As Byte Dim tagnames As Long While ie.Busy Sleep 250 DoEvents Wend While ie.ReadyState <> 4 Sleep 250 DoEvents Wend Do tagnames = ie.document.getelementsByTagName("*").Length For i = 1 To 5 Sleep 75 If tagnames = ie.document.getelementsByTagName("*").Length Then Exit Sub Next Loop End Sub
This should return:
Further to Tim's eagle eye, here is some documentation. The error (as Tim spotted) is that getElementById does not return a list of HTML elements but a single element.
Here is a documentation link
©2020 All rights reserved.