Jump to content

Hello, I'm having a issue with scraping and using PHP. some say I should use DOM but I'm not sure how to use it for this.

 

First box is part of the page that I get, the other is my code to the 2,000,000 out of it, but it just returns the whole page.

<tr style="background-color: #ffffff;">
<td style="text-align: left; border-right: 0px; border-top: 0px;"><span style="padding-left: 3px;">Bank:</span></td>
<td style="text-align: left; border-left: 0px; border-top: 0px;"><span style="padding-left: 3px;">2,000,000 Dollar</span></td>
</tr>
$start_description1 = 'Bank:</span></td>
<td style="text-align: left; border-left: 0px; border-top: 0px;"><span style="padding-left: 3px;">';
$end_description1 = ' Dollar</span></td>
</tr>';
$description1_start_pos = strpos($temp, $start_description1) + strlen($start_description1);
$description1_end_pos = strpos($temp, $end_description1) - $description1_start_pos;
$description1 = substr($temp, $description1_start_pos, $description1_end_pos);

print_r($description1);

 

Back-end developer, electronics "hacker"

Link to comment
https://linustechtips.com/topic/757462-php-scraping-issues/
Share on other sites

Link to post
Share on other sites

$start_description1 = 'Bank:</span></td>\n<td style="text-align: left; border-left: 0px; border-top: 0px;"><span style="padding-left: 3px;">';

\n ist the ASCII representation of a line break. You have to tell explicitly that there is a line break at this position, otherwise PHP can't find it.

Link to comment
https://linustechtips.com/topic/757462-php-scraping-issues/#findComment-9576107
Share on other sites

Link to post
Share on other sites

3 minutes ago, Organized said:

$start_description1 = 'Bank:</span></td>\n<td style="text-align: left; border-left: 0px; border-top: 0px;"><span style="padding-left: 3px;">';

 

Doesent look like it did a difference, I also added it to the $end_description1

Back-end developer, electronics "hacker"

Link to comment
https://linustechtips.com/topic/757462-php-scraping-issues/#findComment-9576130
Share on other sites

Link to post
Share on other sites

2 minutes ago, Joveice said:

Doesent look like it did a difference, I also added it to the $end_description1

Wait, it did. it doesent post /> at the top of the page before the rest of the content. I now see that it did this before

Back-end developer, electronics "hacker"

Link to comment
https://linustechtips.com/topic/757462-php-scraping-issues/#findComment-9576149
Share on other sites

Link to post
Share on other sites

Okay, I tried it by myself now, with this code:

 

<?php
$temp = '<tr style="background-color: #ffffff;">
<td style="text-align: left; border-right: 0px; border-top: 0px;"><span style="padding-left: 3px;">Bank:</span></td>
<td style="text-align: left; border-left: 0px; border-top: 0px;"><span style="padding-left: 3px;">2,000,000 Dollar</span></td>
</tr>';

$start_description1 = 'Bank:</span></td>
<td style="text-align: left; border-left: 0px; border-top: 0px;"><span style="padding-left: 3px;">';
$end_description1 = ' Dollar</span></td>
</tr>';
$description1_start_pos = strpos($temp, $start_description1) + strlen($start_description1);
$description1_end_pos = strpos($temp, $end_description1) - $description1_start_pos;
$description1 = substr($temp, $description1_start_pos, $description1_end_pos);

print_r($description1);
?>

It prints out "2,000,000", so it's working. I think that your $temp is maybe a bit different to what you have posted here.

 

Maybe try out a

echo htmlentities($temp);

It will print out the string without actually parsing the HTML in your browser.

Link to comment
https://linustechtips.com/topic/757462-php-scraping-issues/#findComment-9576151
Share on other sites

Link to post
Share on other sites

5 minutes ago, Organized said:

Okay, I tried it by myself now, with this code:

 


<?php
$temp = '<tr style="background-color: #ffffff;">
<td style="text-align: left; border-right: 0px; border-top: 0px;"><span style="padding-left: 3px;">Bank:</span></td>
<td style="text-align: left; border-left: 0px; border-top: 0px;"><span style="padding-left: 3px;">2,000,000 Dollar</span></td>
</tr>';

$start_description1 = 'Bank:</span></td>
<td style="text-align: left; border-left: 0px; border-top: 0px;"><span style="padding-left: 3px;">';
$end_description1 = ' Dollar</span></td>
</tr>';
$description1_start_pos = strpos($temp, $start_description1) + strlen($start_description1);
$description1_end_pos = strpos($temp, $end_description1) - $description1_start_pos;
$description1 = substr($temp, $description1_start_pos, $description1_end_pos);

print_r($description1);
?>

It prints out "2,000,000", so it's working. I think that your $temp is maybe a bit different to what you have posted here.

 

Maybe try out a


echo htmlentities($temp);

It will print out the string without actually parsing the HTML in your browser.

Hm, well that filled the page with all the html. Also since you got it to work I added it to phptester.net and also got it to work. may there be anything else on the page that screws this up? the strings are not dublicates and should be unique.

Back-end developer, electronics "hacker"

Link to comment
https://linustechtips.com/topic/757462-php-scraping-issues/#findComment-9576184
Share on other sites

Link to post
Share on other sites

3 minutes ago, Organized said:

You could echo all your variables for debugging purposes. This way you can have a look at the raw values, often you can see whats wrong by doing that.

Well I'll try and see if I can figure it out, else I'm gonna leave it dead, it was just for visuals as the main scraping part works.

Back-end developer, electronics "hacker"

Link to comment
https://linustechtips.com/topic/757462-php-scraping-issues/#findComment-9576204
Share on other sites

Link to post
Share on other sites

4 minutes ago, Organized said:

You should have a look at this: https://regex101.com/r/1mjmmx/1

 

It's a good opportunity to learn regular expressions, as it's the best way to do such scraping things.

Regular expression I have tryed to understand before but I never get a hand of it :P

Back-end developer, electronics "hacker"

Link to comment
https://linustechtips.com/topic/757462-php-scraping-issues/#findComment-9576251
Share on other sites

Link to post
Share on other sites

If you want to keep things simple, I'd start with using stripos and strripos

 

the i in the name means case is ignored , and the r in the name means reverse.

 

So i'd first look for the phrase " dollar</span>" since there should always be dollar after a currency amount,right ?  once I found that using stripos , you can use strripos to search for the first ">" character starting search from offset returned by the previous stripos

 

something like this:
 

$finish = stripos($text,' dollar</span>');

if ($finish != FALSE) {
	$start = strripos($text,'>',$finish-1);
	if ($start !== FALSE) {
		$subtext = substr($text,$start+1,$finish-$start-1); // may be -2, too lazy to test
	}
}

 

Link to comment
https://linustechtips.com/topic/757462-php-scraping-issues/#findComment-9576481
Share on other sites

Link to post
Share on other sites

1 hour ago, mariushm said:

If you want to keep things simple, I'd start with using stripos and strripos

 

the i in the name means case is ignored , and the r in the name means reverse.

 

So i'd first look for the phrase " dollar</span>" since there should always be dollar after a currency amount,right ?  once I found that using stripos , you can use strripos to search for the first ">" character starting search from offset returned by the previous stripos

 

something like this:
 


$finish = stripos($text,' dollar</span>');

if ($finish != FALSE) {
	$start = strripos($text,'>',$finish-1);
	if ($start !== FALSE) {
		$subtext = substr($text,$start+1,$finish-$start-1); // may be -2, too lazy to test
	}
}

 

Yes the string will always be the same, the amount will never

Back-end developer, electronics "hacker"

Link to comment
https://linustechtips.com/topic/757462-php-scraping-issues/#findComment-9576837
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×