scraping proxy ip from proxy websites with php

Published on : April 27, 2026

Author:

Category: Uncategorized


scraping proxy ip from proxy websites with php

When a user from a IP address hit over a website or web page frequently it consider as a robot, so server can block the IP to access the website or server.

Or the situation like this user wants to access a website from a country which is not supported by the webserver or website.

For avoid this type of odd situation we used proxy server.

So we need active proxy server.

We are going to make a proxy server collector by PHP which can collect active proxy server address from different website we define.

Tools and technique used

  • Scraping or Collect Proxy Website data using cURL
  • Parsing Data using Simple HTML DOM Parser and regular expression.

First We Look How Proxy Server Example How it Collect Data From A webpage

[sourcecode language=”php”]</pre>
<?php
//All Variable
$proxy_server = ‘64.31.22.131:7808’; //Use Good Proxy Server
$url = ‘https://torvpn.com/proxylist.html’;
$timeout = 30000;
//Get Html
$html = getHTML($url,$timeout, $proxy_server);
//View Page Content
echo $html;
//Function To Get Html
function getHTML($url, $timeout, $proxy_server){
$ch = curl_init($url);
$agent = ‘Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 FirePHP/0.7.4’;
curl_setopt($ch, CURLOPT_PROXY, $proxy_server);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_URL, $url);
$content = curl_exec($ch);
curl_close ($ch);
return $content;
}
?>
[/sourcecode]

If you use this code and want to scrap data from a page you says the website server that your IP is 64.31.22.131 and it hide your real IP.

Step By Step Collect Proxy From Website

  • Now We Collect Active Proxy Address from Different Free Proxy Provider Website (Given Example is for single site
  • First Call The Parsing Library simple html dom parser
  • Define time out and proxy provider website name
  • Regular expression for detect proxy address (ex: 192.168.17.125:8080)
  • Then content of page by curl (getHTML() method)
  • Convert it to Array String with of simple_html_dom’s library str_get_html() method.
  • Find every td element inner text by find(“……”) method
  • Check every td witch match with IP address regular expression
  • Then check the Proxy Address if it active or not active by proxy check method

[sourcecode language=”php”]
<?php
include(‘lib/SimpleHTMLDom/simple_html_dom.php’);
$timeout = 15000; // Curl Timeout
$proxy_website = ‘http://proxy-list.org/en/’;

//–>> Group Expression For Match IP With Port (Example 192.168.17.252:8080)
$groupExpression = ‘/^d{1,3}.d{1,3}.d{1,3}.d{1,3}:d{2,6}z/’;
$address_one = array();

//–>> First Call getHTMl() method get Array Sting by str_get_html() method
$proxy_server_html = str_get_html(getHTML($proxy_website, $timeout));

//–>> Get Every td inner text element of the page
$proxy_Server_port = $proxy_server_html->find(‘table[table width="488"] tr[class="RegularText"] td’);

if(!empty($proxy_Server_port)){
foreach($proxy_Server_port as $date){
$data = trim($date->innertext);

//–>> Match With IP Regular Expression
if(preg_match($groupExpression, $data)){
$address_one = explode(‘:’, $data); //Separate IP and Port Address

//–>> Check The IP Address Status
if((proxyCheck($address_one)) == 1){
echo "Proxy Address: ".$address_one[0].":".$address_one[1]."<br>";
}
}
}
}else{
echo "Your Request Data Is Empty";
}

//Function For Check Status of proxy server
function proxyCheck($data){
if($con = @fsockopen($data[0], $data[1], $eroare, $eroare_str, 10)){
fclose($con);
return true;
}else{
return false;
}
}

//Get Website Data As HTML FIle

function getHTML($url, $timeout){
$ch = curl_init($url);
$agent = ‘Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 FirePHP/0.7.4’;
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_URL, $url);
$content = curl_exec( $ch );
curl_close ( $ch );
return $content;
}
[/sourcecode]

The Expression inside find() method depends on the element witch you want to scrap

An Example Show How You Get Correct Data From A Page

Scraping Proxy Address From Free Proxy website

Live Demo : Click Here

Download Source Code : Click Here