Crawling using PHP

Since there is a ton of interest for information for statistical surveying, value insight or contender investigation and so forth the interest for computerising the way toward scratching the information has additionally developed. This is the place where web scratching becomes an integral factor. Web scratching is the robotised interaction of scratching the information from your preferred web in a configuration.

 

peepgrams

Why web scratching has become so basic is a result of a bunch of elements. Right off the bat, the information that you access on the Web is not accessible for download. Notwithstanding, you need it downloaded and in an alternate configuration. So, you need an approach to download the information from different pages of a site or from numerous sites. In this manner, you need web scratching.

 

Web scratching is likewise required because you have not an ideal opportunity to worry over how to download, duplicate, save the information that you see on a page. What you need is a simple, mechanized method of scratching whatever information that you see on the website page and thus web scratching! What web scratching does so well separate from giving you the information that you need is that it saves you many worker hours that you will in any case require on the off chance that you attempt to physically get the information.

 

On occasion, there is no Programming interface from the source site and subsequently web scratching is the best way to extricate the information.

Web Crawler/Scraper using PHP

In php you can find many applications created for such purpose before going further, in the manual you can find the code below for crawling html pages from the internet.

In php file_get_contents(url) takes string/URL of the online resource convert all the website code into string and return that converted string. preg_match_all() function takes regular expression of what you trying to look into the website such as tags and functions and also it takes converted string and all matches find by the preg_match_all stores in the match 3rd variable after that you can recall the function using loop.

Full Code of Crawler in PHP:

 

<?php

 

 

        function crawl_page($url,$depth=2)

        {

            if($depth>0)

            {

                $html = file_get_contents($url);

                preg_match_all('~<a.*?href="(.*?)".*?>~',$html,$matches);

                foreach($matches[1] as $newurl)

                {

                    crawl_page($newurl,$depth-1);

                }

                file_put_contents('results.html',"\n\n".$html."\n\n",FILE_APPEND);

            }

        }

        crawl_page('https://www.peepgrams.com/',2);

 

?>

Server Creating Using Node:

var http=require('http');

http.createServer(function(req,res){

res.writeHead(200,{'Content-Type':'text/html'});

res.end('Hello From the Server');


}).listen(8081);

Execution of the code using CMD:

locate the file using file explorer and change the path in the cmd to the file holding path using cd command and then type node and file name such as app.js (its javascript file code) and run it on the port number such as 8081 using localhost:8081 you will see the output. submit the output screenshot and file in one doc file link below server.


 Crawler Task Submission Link:

Click for submission

Server Task Submission Link:

Click for submission


Post a Comment

Previous Post Next Post