Thanks for taking interest in this.
I was given the [tedious] task to look what is the country of origin of some medicins, as they are registered with the colombian food and drug administration. The agency uses a website with a javascript (.jsp extension) and I would like to know if it is possible to automate the process. This is the step by step of the lookup:
I don't have the slightest idea whether this could be accomplished, and if so, how; so I'd appreciate any guidance that allow me to start in any direction (other than the one I have at hand now: looking them by hand!). I'm familiar with R and some VB, but if it's possible in any other language, I'll give it a try.
What I've tried:
Thanks!
I have used phantomjs
with the RSelenium
package. Details on how to setup phantomjs
can be found at http://cran.r-project.org/web/packages/RSelenium/vignettes/RSelenium-saucelabs.html#id2a
phantomjs
can be driven directly without the need for a Selenium Server details here . It should be alot quicker for the task you outline due to its headless nature.
The first part of your question can be achieved as follows:
appURL <- "http://web.sivicos.gov.co:8080/consultas/consultas/consreg_encabcum.jsp"
library(RSelenium)
pJS <- phantom()
remDr <- remoteDriver(browserName = "phantom")
remDr$open()
remDr$navigate(appURL)
# Get the third list item of the select box (MEDICAMENTOS)
webElem <- remDr$findElement("css", "select[name='grupo'] option:nth-child(3)")
webElem$clickElement() # select this element
# Send text to input value="" name="expediente
webElem <- remDr$findElement("css", "input[name='expediente']")
webElem$sendKeysToElement(list(2203))
# Click the Buscar button
remDr$findElement("id", "INPUT2")$clickElement()
Now the form has been filled in and the link clicked. The data is in an iframe with name="datos"
.
Iframes need to be switched to:
# switch to datos iframe
remDr$switchToFrame(remDr$findElement("css", "iframe[name='datos']"))
remDr$findElement("css", "a")$clickElement() # click the link given in the iframe
# get the resulting data
appData <- remDr$getPageSource()[[1]]
# close phantom js
pJS$stop()
The data for the iframe is now contained in appData
. As an example we look at the third table using the simple extraction function readHTMLTable
:
readHTMLTable(appData, which = 3)
V1 V2 V3 V4 V5 V6
1 Presentacion Comercial <NA> <NA> <NA> <NA> <NA>
2 Expediente Consec Termino Unidad / Medida Cantidad Descripcion
3 000002203 01 0176 ml 60,00 FRASCO AMBAR POR 60 ML
4 000002203 02 0176 ml 120,00 FRASCO AMBAR POR 120 ML
5 000002203 03 0176 ml 90,00 FRASCO AMBAR POR 90 ML
V7 V8 V9
1 <NA> <NA> <NA>
2 Fecha insc Estado Fecha Inactiv
3 2007/01/30 Activo
4 2007/01/30 Activo
5 2012/03/15 Activo
©2020 All rights reserved.