Create Request object, fetch page.
$request = Request()
$response = $request->get('google.com');
Optionally pass a client object. Various options can be set on the client Object. Like cookie behaviour, custom headers etc.
Alternatively chain method pattern can be used.
$client = new CurlClient();
$client->setRedirects(4)->setUserAgent("Bro bot");
$req = new Request($client);
// $req->setClient($client); // Alternative
$res = $req->get('google.com');
Default is GuzzleClient. New clients can be created easily. Or existing clients behaviour and setting can be modified on the fly.
Create a client, then modify its options.
Not available with CustomClients but only with GuzzleClient and CurlClient.
$broBot = $client->setRedirects(4)->setUserAgent("Bro bot");
For more advanced functionality. Here we need to modify the url based on some options.
class AddPageNumClient extends CurlClient
{
public function get(string $url, array $options = []): Response
{
if (!empty($options['page_num'])) {
$url = $url."?page=".$options['page_num'];
return parent::get($url);
}
}
return parent::get($url);
}
}
Other use cases could be: modify https,
Thin wrappers around the underlying curl and guzzle clients. Use when you need control. Don't have ->set..() functions.
3 ways of using these:
// Pass 'raw' client
$ch = curl_init();
$curlCustom = new CurlCustomClient();
$curlCustom->setRawClient($ch);
// Pass full configuration
$client = new CurlCustomClient();
$client->setCustomOptions([
CURLOPT_HEADER => 1,
CURLOPT_NOBODY => 1,
]);
// Extend the CurlCustomClient OR GuzzleCustomClient class and set $customClientOptions
class OnlyHeadClient extends CurlCustomClient
{
public array $customClientOptions = [
CURLOPT_HEADER => 1,
CURLOPT_NOBODY => 0,
CURLOPT_USERAGENT => 'only head',
];
}
Request returns a Response Object.
This object contains the response body, and other fields like response headers, http code.
TODO: add 'nodebug/light' mode to skip extra data like httpcode
This object accepts callbacks to manipulate response body
$response = $request->get('google.com');
$response->modBody(['toabs','tidy']);
$body = $response->getBody();
TODO: add common modifications, like toabsurls
Set various conditions to analyze response and potentially find out why it failed to fetch the right page. This can be used to detect:
- faults: incorrect response codes, being blocked by firewalls. Detecting 404 etc.
- unexpected outputs. No set-cookie header. unexpected JS redirects .
$req = new Request();
$res = $req->enableCookies()->get($url);
//Create Response Debug Obj. Set some failure conditions
$debug = new ResponseDebug();
$debug->setGoodHttpCode(200)
->setbadStrings(["blocked"])
->setgoodStrings(["</html>"]) // If this is not found, response is considered failed.
->setGoodRegex(['/\d\d\d\d\d/'])
->setContainExpectedHeaders(['set-cookie: ','content-type: application/json']);
$debug->setResponse($res);
if ($debug->isFail()){
$failAr = $debug->getFailDetail();
// The ->setbadStrings(["blocked"]) condition is true.
if (isset($failAr['bad_string']) && stristr($failAr['bad_string'],'blocked')){
//Modify req to use expensive proxy and fetch again.
$res = $req->setProxy($priceProxy)->get($url);
}
}
Its important the user has teh ability to utilize the power of guzzle while still being to use this library So there are several ways in which it can be used
$goodBotClient = new GuzzleClient();
$goodBotClient->setRedirects(4)->setUserAgent("Good bot");
$req = new Request($client);
// $req->setClient($client); // Alternative
$res = $req->get('google.com');
class AddPageNumClient extends GuzzleClient
{
public function get(string $url, array $options = []): Response
{
if (!empty($options['page_num'])) {
$url = $url."?page=".$options['page_num'];
return parent::get($url);
}
}
return parent::get($url,$options);
}
}
$goodBotClient = new AddPageNumClient();
$req = new Request($goodBotClient);
$res = $req->get('google.com',['page_num' => 2]); // Fetches google.com?page_num=2
Allows setting options for the guzzle object directly. So you directly interact with internal guzzle object used by the library. PS: These clients do not have the set_ functions. So you have to set the options directly.
class OnlyHeadGuzzleClient extends GuzzleCustomClient
{
public array $clientOptions = [
'headers' => [
'head5' => 'value',
],
];
}
$req = new Request();
$req->setClient(new OnlyHeadGuzzleClient());
$onlyHeadRes = $req->get('https://manytools.org/http-html-text/http-request-headers');
Create your own guzzle client, then pass it to the library
// Create a regular guzzle client
$myGuzzle = new /Guzzle/Client(['base_uri' => 'https://manytools.org/', 'headers' => ['User-Agent' => "raw guzzle"]]);
// Create a GuzzleCustomClient
$guzzleCustom = new GuzzleCustomClient();
$guzzleCustom->setRawClient($myGuzzle);
// Attach the customClient to a request
$req = new Request();
$req->setClient($guzzleCustom);
// Get
$res = $req->get('http-html-text/http-request-headers');
PS: In a real application all the clients could be created once in a location and then reused in different requests
via $request->setClient()