<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>cURL - Devhour</title>
	<atom:link href="https://www.devhour.net/tag/curl/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.devhour.net</link>
	<description>Taking time to write about development</description>
	<lastBuildDate>Tue, 12 Mar 2024 10:01:42 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.4.3</generator>

<image>
	<url>https://www.devhour.net/wp-content/uploads/2024/03/cropped-devhourlogo-32x32.png</url>
	<title>cURL - Devhour</title>
	<link>https://www.devhour.net</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Scraping data with PHP and cURL</title>
		<link>https://www.devhour.net/scraping-data-with-php-and-curl/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=scraping-data-with-php-and-curl</link>
					<comments>https://www.devhour.net/scraping-data-with-php-and-curl/#respond</comments>
		
		<dc:creator><![CDATA[Jamie]]></dc:creator>
		<pubDate>Sun, 12 May 2013 12:17:00 +0000</pubDate>
				<category><![CDATA[cURL]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Dev Hour]]></category>
		<category><![CDATA[Web Development]]></category>
		<guid isPermaLink="false">https://www.devhour.net/?p=20</guid>

					<description><![CDATA[<p>I’m working on another idea which I hope to release soon which involves scraping websites using PHP and cURL. I don’t want to give too much away before I release the website so I won’t go into too much detail. However, what I can tell you is that it required me to go out and [&#8230;]</p>
<p>The post <a href="https://www.devhour.net/scraping-data-with-php-and-curl/">Scraping data with PHP and cURL</a> first appeared on <a href="https://www.devhour.net">Devhour</a>.</p>]]></description>
										<content:encoded><![CDATA[<p>I’m working on another idea which I hope to release soon which involves scraping websites using PHP and cURL.</p>



<p>I don’t want to give too much away before I release the website so I won’t go into too much detail. However, what I can tell you is that it required me to go out and &nbsp;get a lot of data from external websites using variables passed through from a form on my end.</p>



<p>I originally started out using a piece of python software called <a href="https://scrapy.org/">Scrapy</a> which worked very well, but the logistics of using that and either storing the data or displaying it on a webpage became too much of a hassle so I instead opted to go for PHP and cURL.</p>



<p>For the PHP side I’m using the framework <a href="https://www.codeigniter.com/">Codeigniter</a> which is a very easy and very speedy framework which is perfect for what I wanted to do.</p>



<p>The basic flow of how everything works is:</p>



<ol>
<li>The form is filled out and submitted</li>



<li>Data from the form is sent to the external website in a cURL request</li>



<li>The webpage content is then returned</li>



<li>From there the data can be formatted and displayed accordingly</li>
</ol>



<p>To do this with PHP and cURL is a fairly straight forward process and I’ll show you how to go about it. The only real issue you may come across is that when forms come into play you need to make sure each and every form element is included in the call.</p>



<div class="wp-block-kevinbatdorf-code-block-pro padding-bottom-disabled cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#575279;--cbp-line-number-width:calc(2 * 0.6 * .875rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#faf4ed"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="$url = 'http://www.website.com/login.php';
$postdata = array('username' =&gt; &quot;Jamie&quot;,'password' =&gt; &quot;password&quot;);
$ch = curl_init();
if($ch){
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  curl_setopt($ch, CURLOPT_POST, 1);
  curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
  curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt'); // set cookie file to given file   
  curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt'); // set same file as cookie jar   
  $content = curl_exec($ch);
  $headers = curl_getinfo($ch);
  curl_close($ch);
  // Debug option
  // print_r($headers);
  if($headers['http_code'] == 200){
    echo $content;
  }
}" style="color:#575279;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki rose-pine-dawn" style="background-color: #faf4ed" tabindex="0"><code><span class="line"><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">url</span><span style="color: #575279"> </span><span style="color: #286983">=</span><span style="color: #575279"> </span><span style="color: #EA9D34">&#39;http://www.website.com/login.php&#39;</span><span style="color: #797593">;</span></span>
<span class="line"><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">postdata</span><span style="color: #575279"> </span><span style="color: #286983">=</span><span style="color: #575279"> </span><span style="color: #B4637A; font-style: italic">array</span><span style="color: #797593">(</span><span style="color: #EA9D34">&#39;username&#39;</span><span style="color: #575279"> </span><span style="color: #286983">=&gt;</span><span style="color: #575279"> </span><span style="color: #EA9D34">&quot;Jamie&quot;</span><span style="color: #797593">,</span><span style="color: #EA9D34">&#39;password&#39;</span><span style="color: #575279"> </span><span style="color: #286983">=&gt;</span><span style="color: #575279"> </span><span style="color: #EA9D34">&quot;password&quot;</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #575279"> </span><span style="color: #286983">=</span><span style="color: #575279"> </span><span style="color: #B4637A; font-style: italic">curl_init</span><span style="color: #797593">();</span></span>
<span class="line"><span style="color: #286983">if</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">){</span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_URL</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">url</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_CONNECTTIMEOUT</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #D7827E">15</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_RETURNTRANSFER</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #D7827E">true</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_POST</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #D7827E">1</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_POSTFIELDS</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">postdata</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_COOKIEFILE</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #EA9D34">&#39;cookies.txt&#39;</span><span style="color: #797593">);</span><span style="color: #575279"> </span><span style="color: #797593; font-style: italic">//</span><span style="color: #9893A5; font-style: italic"> set cookie file to given file   </span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_COOKIEJAR</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #EA9D34">&#39;cookies.txt&#39;</span><span style="color: #797593">);</span><span style="color: #575279"> </span><span style="color: #797593; font-style: italic">//</span><span style="color: #9893A5; font-style: italic"> set same file as cookie jar   </span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">content</span><span style="color: #575279"> </span><span style="color: #286983">=</span><span style="color: #575279"> </span><span style="color: #B4637A; font-style: italic">curl_exec</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">headers</span><span style="color: #575279"> </span><span style="color: #286983">=</span><span style="color: #575279"> </span><span style="color: #B4637A; font-style: italic">curl_getinfo</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #B4637A; font-style: italic">curl_close</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #797593">  </span><span style="color: #797593; font-style: italic">//</span><span style="color: #9893A5; font-style: italic"> Debug option</span></span>
<span class="line"><span style="color: #797593">  </span><span style="color: #797593; font-style: italic">//</span><span style="color: #9893A5; font-style: italic"> print_r($headers);</span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #286983">if</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">headers</span><span style="color: #797593">[</span><span style="color: #EA9D34">&#39;http_code&#39;</span><span style="color: #797593">]</span><span style="color: #575279"> </span><span style="color: #286983">==</span><span style="color: #575279"> </span><span style="color: #D7827E">200</span><span style="color: #797593">){</span></span>
<span class="line"><span style="color: #575279">    </span><span style="color: #B4637A; font-style: italic">echo</span><span style="color: #575279"> </span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">content</span><span style="color: #797593">;</span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #797593">}</span></span>
<span class="line"><span style="color: #797593">}</span></span></code></pre><span style="display:flex;align-items:flex-end;padding:10px;width:100%;justify-content:flex-end;background-color:#faf4ed;color:#625c88;font-size:12px;line-height:1;position:relative">PHP</span></div>



<p>That’s the entire call and will return the html contents of website.com/login.php. I’ll go through the above code piece by piece and give a run down on each of the different parts.</p>



<div class="wp-block-kevinbatdorf-code-block-pro padding-bottom-disabled cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#575279;--cbp-line-number-width:calc(1 * 0.6 * .875rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#faf4ed"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="$url = 'http://www.website.com/login.php';
$postdata = array('username' =&gt; &quot;Jamie&quot;, 'password' =&gt; &quot;password&quot;);" style="color:#575279;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki rose-pine-dawn" style="background-color: #faf4ed" tabindex="0"><code><span class="line"><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">url</span><span style="color: #575279"> </span><span style="color: #286983">=</span><span style="color: #575279"> </span><span style="color: #EA9D34">&#39;http://www.website.com/login.php&#39;</span><span style="color: #797593">;</span></span>
<span class="line"><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">postdata</span><span style="color: #575279"> </span><span style="color: #286983">=</span><span style="color: #575279"> </span><span style="color: #B4637A; font-style: italic">array</span><span style="color: #797593">(</span><span style="color: #EA9D34">&#39;username&#39;</span><span style="color: #575279"> </span><span style="color: #286983">=&gt;</span><span style="color: #575279"> </span><span style="color: #EA9D34">&quot;Jamie&quot;</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #EA9D34">&#39;password&#39;</span><span style="color: #575279"> </span><span style="color: #286983">=&gt;</span><span style="color: #575279"> </span><span style="color: #EA9D34">&quot;password&quot;</span><span style="color: #797593">);</span></span></code></pre><span style="display:flex;align-items:flex-end;padding:10px;width:100%;justify-content:flex-end;background-color:#faf4ed;color:#625c88;font-size:12px;line-height:1;position:relative">PHP</span></div>



<p>Firstly, the url variable should be self explanatory and the postdata is just a simple array which contains the form elements that are required to login with (in this case a username and password).</p>



<div class="wp-block-kevinbatdorf-code-block-pro padding-bottom-disabled cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#575279;--cbp-line-number-width:calc(1 * 0.6 * .875rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#faf4ed"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="$ch = curl_init();if($ch){" style="color:#575279;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki rose-pine-dawn" style="background-color: #faf4ed" tabindex="0"><code><span class="line"><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #575279"> </span><span style="color: #286983">=</span><span style="color: #575279"> </span><span style="color: #B4637A; font-style: italic">curl_init</span><span style="color: #797593">();</span><span style="color: #286983">if</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">){</span></span></code></pre><span style="display:flex;align-items:flex-end;padding:10px;width:100%;justify-content:flex-end;background-color:#faf4ed;color:#625c88;font-size:12px;line-height:1;position:relative">PHP</span></div>



<p>Create a new curl object and if all is well continue on.</p>



<div class="wp-block-kevinbatdorf-code-block-pro padding-bottom-disabled cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#575279;--cbp-line-number-width:calc(1 * 0.6 * .875rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#faf4ed"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt'); // set cookie file to given file
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt'); // set same file as cookie jar" style="color:#575279;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki rose-pine-dawn" style="background-color: #faf4ed" tabindex="0"><code><span class="line"><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_URL</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">url</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_CONNECTTIMEOUT</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #D7827E">15</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_RETURNTRANSFER</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #D7827E">true</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_POST</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #D7827E">1</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_POSTFIELDS</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">postdata</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_COOKIEFILE</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #EA9D34">&#39;cookies.txt&#39;</span><span style="color: #797593">);</span><span style="color: #575279"> </span><span style="color: #797593; font-style: italic">//</span><span style="color: #9893A5; font-style: italic"> set cookie file to given file</span></span>
<span class="line"><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_COOKIEJAR</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #EA9D34">&#39;cookies.txt&#39;</span><span style="color: #797593">);</span><span style="color: #575279"> </span><span style="color: #797593; font-style: italic">//</span><span style="color: #9893A5; font-style: italic"> set same file as cookie jar</span></span></code></pre><span style="display:flex;align-items:flex-end;padding:10px;width:100%;justify-content:flex-end;background-color:#faf4ed;color:#625c88;font-size:12px;line-height:1;position:relative">PHP</span></div>



<p>These are all curl options which I am going to use. You can view the rest of the different options over at the <a href="https://php.net/manual/en/function.curl-setopt.php">php website</a>. The ones we are using and basically all to do with logging in. Storing the cookies and passing through the post data are the main ones to take note of.</p>



<div class="wp-block-kevinbatdorf-code-block-pro padding-bottom-disabled cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#575279;--cbp-line-number-width:calc(1 * 0.6 * .875rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#faf4ed"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="$content = curl_exec($ch);
$headers = curl_getinfo($ch);
curl_close($ch);" style="color:#575279;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki rose-pine-dawn" style="background-color: #faf4ed" tabindex="0"><code><span class="line"><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">content</span><span style="color: #575279"> </span><span style="color: #286983">=</span><span style="color: #575279"> </span><span style="color: #B4637A; font-style: italic">curl_exec</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">headers</span><span style="color: #575279"> </span><span style="color: #286983">=</span><span style="color: #575279"> </span><span style="color: #B4637A; font-style: italic">curl_getinfo</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #B4637A; font-style: italic">curl_close</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">);</span></span></code></pre><span style="display:flex;align-items:flex-end;padding:10px;width:100%;justify-content:flex-end;background-color:#faf4ed;color:#625c88;font-size:12px;line-height:1;position:relative">PHP</span></div>



<p>Last but not least, execute the curl request passing through the options we used, set the returned content to a variable and also grab the headers before finally closing the curl object.</p>



<p>As you will note in the original code I use the headers variable as a debug option. This is very handy, in particular the header_code which can be very useful. If you ever find that something isn’t working, double check that you are getting a 200 code and not a 400/501.</p>



<p>From there you can grab/scrape the content and data to your hearts content. A great thing is now that you have received and stored the cookies from logging in, you have access to ‘authenticated only’ sections of the website. So you can go away and run more curl requests to get those areas of the website.</p>



<p>I was about to end it there but one other important piece I have come across is that some forms that you fill out will actually re-direct you to different parts of the website after submit. It’s fairly easy to identify because you will get a header code of 302 and the great thing is that you also get a redirect_url in the headers. All you need to do is make another curl request using the redirect url you received.</p>



<div class="wp-block-kevinbatdorf-code-block-pro padding-bottom-disabled cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#575279;--cbp-line-number-width:calc(2 * 0.6 * .875rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#faf4ed"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="// original curl request up here
if ($headers['http_code'] == 302){
  $ch = @curl_init();
  curl_setopt($ch, CURLOPT_URL, $headers['redirect_url']);
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt'); // set cookie file to given file   
  curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt'); // set same file as cookie jar   
  $content = curl_exec($ch);
}" style="color:#575279;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki rose-pine-dawn" style="background-color: #faf4ed" tabindex="0"><code><span class="line"><span style="color: #797593; font-style: italic">//</span><span style="color: #9893A5; font-style: italic"> original curl request up here</span></span>
<span class="line"><span style="color: #286983">if</span><span style="color: #575279"> </span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">headers</span><span style="color: #797593">[</span><span style="color: #EA9D34">&#39;http_code&#39;</span><span style="color: #797593">]</span><span style="color: #575279"> </span><span style="color: #286983">==</span><span style="color: #575279"> </span><span style="color: #D7827E">302</span><span style="color: #797593">){</span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #575279"> </span><span style="color: #286983">=</span><span style="color: #575279"> </span><span style="color: #286983">@</span><span style="color: #B4637A; font-style: italic">curl_init</span><span style="color: #797593">();</span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_URL</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">headers</span><span style="color: #797593">[</span><span style="color: #EA9D34">&#39;redirect_url&#39;</span><span style="color: #797593">]);</span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_CONNECTTIMEOUT</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #D7827E">15</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_RETURNTRANSFER</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #D7827E">true</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_COOKIEFILE</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #EA9D34">&#39;cookies.txt&#39;</span><span style="color: #797593">);</span><span style="color: #575279"> </span><span style="color: #797593; font-style: italic">//</span><span style="color: #9893A5; font-style: italic"> set cookie file to given file   </span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #B4637A; font-style: italic">curl_setopt</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #286983">CURLOPT_COOKIEJAR</span><span style="color: #797593">,</span><span style="color: #575279"> </span><span style="color: #EA9D34">&#39;cookies.txt&#39;</span><span style="color: #797593">);</span><span style="color: #575279"> </span><span style="color: #797593; font-style: italic">//</span><span style="color: #9893A5; font-style: italic"> set same file as cookie jar   </span></span>
<span class="line"><span style="color: #575279">  </span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">content</span><span style="color: #575279"> </span><span style="color: #286983">=</span><span style="color: #575279"> </span><span style="color: #B4637A; font-style: italic">curl_exec</span><span style="color: #797593">(</span><span style="color: #797593; font-style: italic">$</span><span style="color: #575279; font-style: italic">ch</span><span style="color: #797593">);</span></span>
<span class="line"><span style="color: #797593">}</span></span></code></pre><span style="display:flex;align-items:flex-end;padding:10px;width:100%;justify-content:flex-end;background-color:#faf4ed;color:#625c88;font-size:12px;line-height:1;position:relative">PHP</span></div>



<p>Once you have gotten the content you require you then need to get access to the specific data or text you’re after. I’m going to show you how to do this using xpath in another post, so keep your eyes out for that.</p>



<p>Like always, if you have any comments or questions feel free to post and I’ll do my best to answer ’em.</p>



<p>Follow me on twitter <a href="https://twitter.com/JAGracie">@JAGracie</a></p><p>The post <a href="https://www.devhour.net/scraping-data-with-php-and-curl/">Scraping data with PHP and cURL</a> first appeared on <a href="https://www.devhour.net">Devhour</a>.</p>]]></content:encoded>
					
					<wfw:commentRss>https://www.devhour.net/scraping-data-with-php-and-curl/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
