Scrape Website Data into Excel using VBA
I'll be showing you an example on how to Scrape Data from a Website into Excel Worksheet using VBA. We'll be scraping data from www(...
https://msexcel-tutorials.blogspot.com/2014/03/scrape-website-data-into-excel-using-vba_14.html
I'll be showing you an example on how to Scrape Data from a Website into Excel Worksheet using VBA. We'll be scraping data from www(dot)renewableuk(dot)com. Please also read the privacy policy of the website before mining data.

Goal:
Get all data under all column headings which can be found on this website i.e.
Wind Project, Region, ..., Type of Project
Requirements:
Usage:
VBA CODE:

Goal:
Get all data under all column headings which can be found on this website i.e.
Wind Project, Region, ..., Type of Project
Requirements:
You need to add a reference, Microsoft HTML Object Library on your VBA project.
Usage:
You can call the ProcessWeb() sub directly by pressing F5 on the Microsoft Visual Basic Window.
Or you can add a button on your excel worksheet then assign ProcessWeb() as the macro.
VBA CODE:
1 | Function ScrapeWebPage( ByVal URL As String )<br> Dim HTMLDoc As New HTMLDocument<br> Dim tmpDoc As New HTMLDocument<br> <br> Dim WS As Worksheet<br> <br> Dim i As Integer , row As Integer <br> Dim File As Integer <br> Dim Filename As String <br> Dim DataLine As String <br> File = FreeFile<br> <br> Filename = ActiveWorkbook.Path & "\html.log" <br> <br> Set WS = Sheets( "DATA" )<br> <br> 'create new XMLHTTP Object<br> Set XMLHttpRequest = CreateObject("MSXML2.XMLHTTP")<br> XMLHttpRequest.Open "GET", URL, False<br> XMLHttpRequest.send<br> <br> While XMLHttpRequest.readyState <> 4<br> DoEvents<br> Wend<br><br> With HTMLDoc.body<br> 'Set HTML Document<br> .innerHTML = XMLHttpRequest.responseText<br> <br> 'Get only Order List Tag of HTML Document<br>Set orderedlists = .getElementsByTagName("ol")<br> <br> 'Reset the Document to the HTML of the second ordered list element<br> 'where we only need to extract the data<br> .innerHTML = orderedlists(1).innerHTML<br> <br> 'Now, we'll get the list items<br> Set ListItems = .getElementsByTagName("li")<br> <br> 'Open our log file for output stream<br> Open Filename For Output As #File<br> For Each li In ListItems<br> <br> With tmpDoc.body<br> 'Set the temp doc<br> .innerHTML = li.innerHTML<br> <br> 'There are about 10 columns, so there are 10 p's<br> Set ps = .getElementsByTagName("p")<br> <br> For Each p In ps<br> 'Print only the text, excluding the tags<br> Print #File, p.innerText<br> Next<br> <br> End With<br> Next<br> 'close the file<br> Close #File<br> <br> End With<br> <br> 'Open the file again, we'll use it to retrieve each data lines<br> Open Filename For Input As #File<br> <br> 'Last row of the worksheet<br> row = lastRow + 1<br> <br> While Not EOF(File)<br> For i = 1 To 10<br> 'read the data from the log file<br> Line Input #File, DataLine<br> <br> 'Put the data on the 1st to 10th column<br> WS.Cells(row, i).Value = DataLine<br> <br> Next i<br> row = row + 1<br> Wend<br> Close #File<br> <br>End Function<br><br>'Get the total number pages we need to scrape<br>Function totalPage() As Integer<br> Dim HTMLDoc As New HTMLDocument<br> Dim tmpDoc As New HTMLDocument<br> Dim html As String<br> Dim mask As String<br> Dim URL As String<br> <br> URL = "http://www.renewableuk.com/en/renewable-energy/wind-energy/uk-wind-energy-database/index.cfm"<br> <br> Set XMLHttpRequest = CreateObject("MSXML2.XMLHTTP")<br> XMLHttpRequest.Open "GET", URL, False<br> XMLHttpRequest.send<br> <br> html = XMLHttpRequest.responseText<br> <br> With HTMLDoc.body<br> .innerHTML = Mid(html, InStr(1, html, "<span chr="" class=" & Chr(34) & " page-dotted="">"), 300)<br> mask = Mid(.innerHTML, InStr(1, LCase(.innerHTML), "</span>") - 2, 2)<br> End With<br> <br> totalPage = mask<br> <br>End Function<br><br>Function lastRow() As Long<br> lastRow = Range("A65536").End(xlUp).row<br>End Function<br><br>Sub ProcessWeb()<br> Dim URL As String<br> Dim i As Integer<br> <br> Range("2:2", Selection.End(xlDown)).ClearContents<br> Range("A2").Select<br> <br> Application.ScreenUpdating = False<br> Application.Cursor = xlWait<br> <br> URL = "http://www.renewableuk.com/en/renewable-energy/wind-energy/uk-wind-energy-database/index.cfm/page/"<br> <br> For i = 1 To totalPage<br> ScrapeWebPage URL & i<br> Application.StatusBar = "Please wait while processing page " & i & " of " & totalPage & "..."<br> Next i<br> <br> Application.ScreenUpdating = True<br> Application.Cursor = xlDefault<br> Application.StatusBar = ""<br> <br> MsgBox "Data Extraction is Done!"<br> <br>End Sub<br> |