Scrape Website Data into Excel using VBA
I'll be showing you an example on how to Scrape Data from a Website into Excel Worksheet using VBA. We'll be scraping data from www(...
http://msexcel-tutorials.blogspot.com/2014/03/scrape-website-data-into-excel-using-vba_14.html
I'll be showing you an example on how to Scrape Data from a Website into Excel Worksheet using VBA. We'll be scraping data from www(dot)renewableuk(dot)com. Please also read the privacy policy of the website before mining data.
Goal:
Get all data under all column headings which can be found on this website i.e.
Wind Project, Region, ..., Type of Project
Requirements:
Usage:
VBA CODE:
Goal:
Get all data under all column headings which can be found on this website i.e.
Wind Project, Region, ..., Type of Project
Requirements:
You need to add a reference, Microsoft HTML Object Library on your VBA project.
Usage:
You can call the ProcessWeb() sub directly by pressing F5 on the Microsoft Visual Basic Window.
Or you can add a button on your excel worksheet then assign ProcessWeb() as the macro.
VBA CODE:
Function ScrapeWebPage(ByVal URL As String)
Dim HTMLDoc As New HTMLDocument
Dim tmpDoc As New HTMLDocument
Dim WS As Worksheet
Dim i As Integer, row As Integer
Dim File As Integer
Dim Filename As String
Dim DataLine As String
File = FreeFile
Filename = ActiveWorkbook.Path & "\html.log"
Set WS = Sheets("DATA")
'create new XMLHTTP Object
Set XMLHttpRequest = CreateObject("MSXML2.XMLHTTP")
XMLHttpRequest.Open "GET", URL, False
XMLHttpRequest.send
While XMLHttpRequest.readyState <> 4
DoEvents
Wend
With HTMLDoc.body
'Set HTML Document
.innerHTML = XMLHttpRequest.responseText
'Get only Order List Tag of HTML Document
Set orderedlists = .getElementsByTagName("ol")
'Reset the Document to the HTML of the second ordered list element
'where we only need to extract the data
.innerHTML = orderedlists(1).innerHTML
'Now, we'll get the list items
Set ListItems = .getElementsByTagName("li")
'Open our log file for output stream
Open Filename For Output As #File
For Each li In ListItems
With tmpDoc.body
'Set the temp doc
.innerHTML = li.innerHTML
'There are about 10 columns, so there are 10 p's
Set ps = .getElementsByTagName("p")
For Each p In ps
'Print only the text, excluding the tags
Print #File, p.innerText
Next
End With
Next
'close the file
Close #File
End With
'Open the file again, we'll use it to retrieve each data lines
Open Filename For Input As #File
'Last row of the worksheet
row = lastRow + 1
While Not EOF(File)
For i = 1 To 10
'read the data from the log file
Line Input #File, DataLine
'Put the data on the 1st to 10th column
WS.Cells(row, i).Value = DataLine
Next i
row = row + 1
Wend
Close #File
End Function
'Get the total number pages we need to scrape
Function totalPage() As Integer
Dim HTMLDoc As New HTMLDocument
Dim tmpDoc As New HTMLDocument
Dim html As String
Dim mask As String
Dim URL As String
URL = "http://www.renewableuk.com/en/renewable-energy/wind-energy/uk-wind-energy-database/index.cfm"
Set XMLHttpRequest = CreateObject("MSXML2.XMLHTTP")
XMLHttpRequest.Open "GET", URL, False
XMLHttpRequest.send
html = XMLHttpRequest.responseText
With HTMLDoc.body
.innerHTML = Mid(html, InStr(1, html, ""), 300)
mask = Mid(.innerHTML, InStr(1, LCase(.innerHTML), "") - 2, 2)
End With
totalPage = mask
End Function
Function lastRow() As Long
lastRow = Range("A65536").End(xlUp).row
End Function
Sub ProcessWeb()
Dim URL As String
Dim i As Integer
Range("2:2", Selection.End(xlDown)).ClearContents
Range("A2").Select
Application.ScreenUpdating = False
Application.Cursor = xlWait
URL = "http://www.renewableuk.com/en/renewable-energy/wind-energy/uk-wind-energy-database/index.cfm/page/"
For i = 1 To totalPage
ScrapeWebPage URL & i
Application.StatusBar = "Please wait while processing page " & i & " of " & totalPage & "..."
Next i
Application.ScreenUpdating = True
Application.Cursor = xlDefault
Application.StatusBar = ""
MsgBox "Data Extraction is Done!"
End Sub