Skip to content Skip to sidebar Skip to footer

Windows Batch / Parse Data From Html Web Page

Is it possible to parse data from web html page using windows batch? let's say I have a web page: www.domain.com/data/page/1 Page source html: ...

The batch language isn't terribly well-suited to parse markup language like HTML, XML, JSON, etc. In such cases, it can be extremely helpful to use a hybrid script and borrow from JScript or PowerShell methods to scrape the data you need. Here's an example demonstrating a batch + JScript hybrid script. Save it with a .bat extension and give it a run.

@if (@CodeSection == @Batch) @then
@echo off & setlocal

set "url=http://www.domain.com/data/page/1"for /f "delims=" %%I in ('cscript /nologo /e:JScript "%~f0" "%url%"') do (
    rem // do something useful with %%I
    echo Linkfound: %%I
)

goto :EOF
@end // end batch / begin JScript hybrid code// returns a DOM root objectfunctionfetch(url) {
    varXHR = WSH.CreateObject("Microsoft.XMLHTTP"),
        DOM = WSH.CreateObject('htmlfile');

    XHR.open("GET",url,true);
    XHR.setRequestHeader('User-Agent','XMLHTTP/1.0');
    XHR.send('');
    while (XHR.readyState!=4) {WSH.Sleep(25)};
    DOM.write('<meta http-equiv="x-ua-compatible" content="IE=9" />');
    DOM.write(XHR.responseText);
    returnDOM;
}

varDOM = fetch(WSH.Arguments(0)),
    links = DOM.getElementsByTagName('a');

for (var i in links)
    if (links[i].href && /\/post\/view\//i.test(links[i].href))
        WSH.Echo(links[i].href);

Solution 2:

If you just need to get /post/view/664654, you can use grep command, e.g.

grep -o '/post/view/[^"]\+' *.html

For parsing more complex HTML, you can use HTML-XML-utils or pup.

Post a Comment for "Windows Batch / Parse Data From Html Web Page"