Windows Develop Bookmark and Share   
 index > Windows Forms General > How do I get the page source from a page AFTER it's rendered and all dynamic javascript has run?
 

How do I get the page source from a page AFTER it's rendered and all dynamic javascript has run?

Hi all,

I'm going nuts with this one.
I'm trying to get the Page Source through a Windows application (not ASP.NET), of a web page.

But, if you go to any website that has either embedded code, or javascript that dynamically builds html - this is all you see when using WebRequest or WebClient.
I see the actual javascript - as in any website that has, let's say for example, an Aweber form in it.

When using Google's Chrome, if you right click over the item that was created dynamically and "Inspect Element" - it shows the HTML as it after the page has fully rendered.
That's the page source I'm looking for.

Anybody have any ideas how to accomplish this?
I'm using VB.net/VS2008.

Thanks!
Denvas  Monday, September 21, 2009 1:46 AM
K, I got it - jeez LOL...

I put my code in:
Private Sub Browzr_ProgressChanged(ByVal sender As Object, ByVal e As System.Windows.Forms.WebBrowserProgressChangedEventArgs) Handles Browzr.ProgressChanged
If e.CurrentProgress = e.MaximumProgress Then
Dim doc As HtmlDocument = Browzr.Document

txtHTML.Text = ""
txtHTML.Text = doc.Body.OuterHtml.ToString
End If
End Sub

Seems a bit ugly, but worx. :)

Thanks again for the help,
Denvas
  • Marked As Answer byDenvas Monday, September 21, 2009 7:00 AM
  •  
Denvas  Monday, September 21, 2009 7:00 AM
Hi,

Where/how are you getting the page source from at the moment (I presume some property on the web browser control ? which one ?) ?

You may need to construct the 'source file' yourself by walking the DOM and appending the associated HTML for each element to a string yourself, but I am not sure. Technically the page 'source' is the javascript, the HTML is the runtime construction of the 'source' and it's possible that what Chrome does is walk the DOM rather than just display the 'source' of a page.
Yort  Monday, September 21, 2009 2:29 AM
Hi Yort,

Thanks for your response...

Definitely giving me something to think about.

I was using:
Private Function readHtmlPage(ByVal url As String) As String

Dim result As String
Dim objResponse As WebResponse
Dim objRequest As WebRequest = System.Net.HttpWebRequest.Create(url)

objResponse = objRequest.GetResponse()

Using sr As New StreamReader(objResponse.GetResponseStream())
result = sr.ReadToEnd()

' Close and clean up the StreamReader
sr.Close()

End Using

Return result

End Function

And that was just giving me the 'source' w/javascript - as you said. And I can't figure out how to render it in order to get the HTML.

So I went to the WebBrowser control (unfortunately).

Now, it's acting weird.

I do the following:
Private Sub btnGo_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnGo.Click
txtURL.Text = "somesite"

With Browzr
Try
.Navigate(txtURL.Text, False)
Catch ex As Exception
MsgBox(ex.Message)
End Try

End If
End With

End Sub

Private Sub Browzr_DocumentCompleted(ByVal sender As Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles Browzr.DocumentCompleted

txtHTML.Text = ""
txtHTML.Text = Browzr.DocumentText

End Sub

Private Sub btnRendered_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnRendered.Click
txtHTML.Text = ""
txtHTML.Text = Browzr.DocumentText

End Sub

The DocumentCompleted event returns the 'source'. However, if I click the btnRendered button a second afterward, I get the HTML rendered version I want.
This has to be automatic, so I'm not going to be pressing buttons - this is just for test.

Isn't DocumentCompleted the same thing as "Rendered"? If not, why isn't there an event called Rendered? :(

Not sure what to do to get the DocumentCompleted to show the rendered HTML.

I'm close, just can't get over this little hump.

Thanks,
- Denvas
Denvas  Monday, September 21, 2009 5:12 AM
K, I got it - jeez LOL...

I put my code in:
Private Sub Browzr_ProgressChanged(ByVal sender As Object, ByVal e As System.Windows.Forms.WebBrowserProgressChangedEventArgs) Handles Browzr.ProgressChanged
If e.CurrentProgress = e.MaximumProgress Then
Dim doc As HtmlDocument = Browzr.Document

txtHTML.Text = ""
txtHTML.Text = doc.Body.OuterHtml.ToString
End If
End Sub

Seems a bit ugly, but worx. :)

Thanks again for the help,
Denvas
  • Marked As Answer byDenvas Monday, September 21, 2009 7:00 AM
  •  
Denvas  Monday, September 21, 2009 7:00 AM

You can use google to search for other answers

Custom Search

More Threads

• Setting Form Caption
• Multi-threading in graphical WF application.
• Threading and winforms
• how to draw a rectangle on top of the main window?
• Help! - something WEIRD happens!
• Merge Menu in 2.0
• Using VB 2008 How does one Move from Field to Field Using Enter Key ?
• Picture Box in WinForms
• open pdf or html
• Form that never loses focus