Can't set object variable in HTMLDocument for scraping data from website

I have seen this question posted before, but my situation is somewhat different so I was hoping I could get some help from the community and maybe a fresh perspective. I have a macro written in vba that's supposed to pull return data from this company's online db, things like returns for MSCI World Index, S&P 500, etc. The code I have works on other pages, but I think this one is different. I talked to the webmaster and he told me that the code was not designed to be scraped, but that is not restricted by their usage policy. It would be a huge time saver for me if I could, indeed, get the data by scraping so I'm trying really hard to figure out a way to do it. I've tagged this under java-script as well, as I think the code would be very similar and I want to accept as many solutions as possible to solve this problem.

The situation is this: I have the following code that throws me an "Object variable not set" error when it comes to the actual scraping of the data (the line that begins 'set els = htmlDoc...." I've tried many combinations of the getElement(s) function thinking that may have been the problem, but I've drawn a blank. Anybody know any other ways to set the object variable in this environment? Or just any other creative ways to pull the data.

I can't give out the login info, but I think by just navigating to the 'caRetPage' site, you can see the html code that I'm trying to scrape/parse.

Sub caScrape()

Dim ie As Object        'ie: internet explorer
Dim htmlDoc As MSHTML.HTMLDocument
Dim els As Object   'to store html objects
Dim rtn As String   'to store values to be scraped from page
Dim loginButton As Object
caLoginPage = "https://members.cambridgeassociates.com/Login/Forms/login-form.asp"
caRetPage = "https://members.cambridgeassociates.com/markets/marketindexsnapshot/DailyMarketReturnsUS.asp"
caUser = "xxxxx"
caPass = "xxxxx"
Set ie = CreateObject("internetexplorer.application")
ie.Visible = True
ie.navigate caLoginPage
While ie.Busy
    DoEvents
Wend
Do Until ie.readyState = 4
    DoEvents
Loop
Set htmlDoc = ie.document
'Log in to site
Set loginButton = htmlDoc.getElementsByTagName("button").Item(0)
With htmlDoc
    .all("Username").Value = caUser
    .all("Password").Value = caPass
    loginButton.Click
End With
While ie.Busy
    DoEvents
Wend
Set acceptButton = htmlDoc.getElementsByName("Submit").Item(0)
acceptButton.Click
While ie.Busy
    DoEvents
Wend


'Here is the page with the return data on it                                                                                   
ie.navigate caRetPage
While ie.Busy
    DoEvents
Wend
Do Until ie.readyState = 4
    DoEvents
Loop
Set htmlDoc = ie.document

'This next line is where the error gets thrown
Set els = htmlDoc.getElementById("tblData")(0).getElementByTagName("tr")(5).getElementByTagName("td")(1)
    'Also tried the following and plenty of variations of getElement command
'Set els = htmlDoc.getElementsByTagName("body")(0).getElementsByTagName("table")(2).getElementsByTagName("tbody")(0).getElementByTagName("tr")(5).getElementByTagName("td")(1)

rtn = els.innerText
Debug.Print(rtn)


End Sub

Any help would be greatly appreciated.

Answers:

Answer

I believe I have a working example with the URL provided in the code posted above. From what I can tell there are frames on the website, so you need to handle these slightly differently.

In addition, waiting for the page to load needed a different method. For that I reused some code I got some another StackOverflow answer. Basically it waits until it doesn't find anymore tags on a page as it is loading.

Here's the code:

#If VBA7 Then
    Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr)
#Else
    Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
#End If

Sub getData()
    Dim element     As Object
    Dim MyURL       As String
    MyURL = "https://members.cambridgeassociates.com/markets/marketindexsnapshot/DailyMarketReturnsUS.asp"

    'Late binding open IE
    Dim MyBrowser   As Object: Set MyBrowser = CreateObject("InternetExplorer.Application")
    MyBrowser.Visible = True
    MyBrowser.navigate MyURL

    waitforload MyBrowser

    Set element = MyBrowser.document.getelementsByTagName("Frameset")(0).Children(1).contentdocument
    Set element = element.getelementByID("tblData")
    Set element = element.getelementsByTagName("tr")(5)
    Set element = element.getelementsByTagName("td")(1)
    Debug.Print element.innertext
End Sub

Private Sub waitforload(ByRef ie As Object)
    Dim i        As Byte
    Dim tagnames As Long

    While ie.Busy
        Sleep 250
        DoEvents
    Wend

    While ie.ReadyState <> 4
        Sleep 250
        DoEvents
    Wend

    Do
        tagnames = ie.document.getelementsByTagName("*").Length
        For i = 1 To 5
            Sleep 75
            If tagnames = ie.document.getelementsByTagName("*").Length Then Exit Sub
        Next
    Loop
End Sub

This should return: 0.10

Answer

Further to Tim's eagle eye, here is some documentation. The error (as Tim spotted) is that getElementById does not return a list of HTML elements but a single element.

Here is a documentation link

https://msdn.microsoft.com/en-us/library/ms536437(v=vs.85).aspx

Tags

Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.