Crawling using PHP
Since there is a ton of interest
for information for statistical surveying, value insight or contender
investigation and so forth the interest for computerising the way toward
scratching the information has additionally developed. This is the place where
web scratching becomes an integral factor. Web scratching is the robotised
interaction of scratching the information from your preferred web in a
configuration.
Why web scratching has become so
basic is a result of a bunch of elements. Right off the bat, the information
that you access on the Web is not accessible for download. Notwithstanding, you
need it downloaded and in an alternate configuration. So, you need an approach
to download the information from different pages of a site or from numerous
sites. In this manner, you need web scratching.
Web scratching is likewise
required because you have not an ideal opportunity to worry over how to download,
duplicate, save the information that you see on a page. What you need is a
simple, mechanized method of scratching whatever information that you see on
the website page and thus web scratching! What web scratching does so well separate
from giving you the information that you need is that it saves you many worker
hours that you will in any case require on the off chance that you attempt to
physically get the information.
On occasion, there is no
Programming interface from the source site and subsequently web scratching is
the best way to extricate the information.
Web Crawler/Scraper using PHP
In php you can find many
applications created for such purpose before going further, in the manual you
can find the code below for crawling html pages from the internet.
In php file_get_contents(url)
takes string/URL of the online resource convert all the website code into string
and return that converted string. preg_match_all() function takes regular
expression of what you trying to look into the website such as tags and functions
and also it takes converted string and all matches find by the preg_match_all
stores in the match 3rd variable after that you can recall the function
using loop.
Full Code of Crawler in PHP:
<?php
function crawl_page($url,$depth=2)
{
if($depth>0)
{
$html =
file_get_contents($url);
preg_match_all('~<a.*?href="(.*?)".*?>~',$html,$matches);
foreach($matches[1] as $newurl)
{
crawl_page($newurl,$depth-1);
}
file_put_contents('results.html',"\n\n".$html."\n\n",FILE_APPEND);
}
}
crawl_page('https://www.peepgrams.com/',2);
?>
Server Creating Using Node:
var http=require('http');
http.createServer(function(req,res){
res.writeHead(200,{'Content-Type':'text/html'});
res.end('Hello From the Server');
}).listen(8081);
Execution of the code using CMD:
locate the file using file explorer and change the path in the cmd to the file holding path using cd command and then type node and file name such as app.js (its javascript file code) and run it on the port number such as 8081 using localhost:8081 you will see the output. submit the output screenshot and file in one doc file link below server.
Post a Comment