When a user from a IP address hit over a website or web page frequently it consider as a robot, so server can block the IP to access the website or server.
Or the situation like this user wants to access a website from a country which is not supported by the webserver or website.
For avoid this type of odd situation we used proxy server.
So we need active proxy server.
We are going to make a proxy server collector by PHP which can collect active proxy server address from different website we define.
Tools and technique used
First We Look How Proxy Server Example How it Collect Data From A webpage
[sourcecode language=”php”]</pre>
<?php
//All Variable
$proxy_server = ‘64.31.22.131:7808’; //Use Good Proxy Server
$url = ‘https://torvpn.com/proxylist.html’;
$timeout = 30000;
//Get Html
$html = getHTML($url,$timeout, $proxy_server);
//View Page Content
echo $html;
//Function To Get Html
function getHTML($url, $timeout, $proxy_server){
$ch = curl_init($url);
$agent = ‘Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 FirePHP/0.7.4’;
curl_setopt($ch, CURLOPT_PROXY, $proxy_server);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_URL, $url);
$content = curl_exec($ch);
curl_close ($ch);
return $content;
}
?>
[/sourcecode]
If you use this code and want to scrap data from a page you says the website server that your IP is 64.31.22.131 and it hide your real IP.
[sourcecode language=”php”]
<?php
include(‘lib/SimpleHTMLDom/simple_html_dom.php’);
$timeout = 15000; // Curl Timeout
$proxy_website = ‘http://proxy-list.org/en/’;
//–>> Group Expression For Match IP With Port (Example 192.168.17.252:8080)
$groupExpression = ‘/^d{1,3}.d{1,3}.d{1,3}.d{1,3}:d{2,6}z/’;
$address_one = array();
//–>> First Call getHTMl() method get Array Sting by str_get_html() method
$proxy_server_html = str_get_html(getHTML($proxy_website, $timeout));
//–>> Get Every td inner text element of the page
$proxy_Server_port = $proxy_server_html->find(‘table[table width="488"] tr[class="RegularText"] td’);
if(!empty($proxy_Server_port)){
foreach($proxy_Server_port as $date){
$data = trim($date->innertext);
//–>> Match With IP Regular Expression
if(preg_match($groupExpression, $data)){
$address_one = explode(‘:’, $data); //Separate IP and Port Address
//–>> Check The IP Address Status
if((proxyCheck($address_one)) == 1){
echo "Proxy Address: ".$address_one[0].":".$address_one[1]."<br>";
}
}
}
}else{
echo "Your Request Data Is Empty";
}
//Function For Check Status of proxy server
function proxyCheck($data){
if($con = @fsockopen($data[0], $data[1], $eroare, $eroare_str, 10)){
fclose($con);
return true;
}else{
return false;
}
}
//Get Website Data As HTML FIle
function getHTML($url, $timeout){
$ch = curl_init($url);
$agent = ‘Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 FirePHP/0.7.4’;
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_URL, $url);
$content = curl_exec( $ch );
curl_close ( $ch );
return $content;
}
[/sourcecode]
The Expression inside find() method depends on the element witch you want to scrap
Live Demo : Click Here
Download Source Code : Click Here