I am interested in writing a perl script that goes to the following link and extracts
the number 1975: https://familysearch.org/search/collection/results#count=20&query=
%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego
%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace
%3AWhite&collection_id=2000219
That website is the amount of white men born in the year 1923 who live in San Diego
County, California in 1940. I am trying to do this in a loop structure to generalize
over multiple counties and birth years.
In the file, locations.txt, I put the list of counties, such as San Diego County.
The current code runs, but instead of the # 1975, it displays unknown. The number 1975
should be in $val\n.
I would very much appreciate any help!
#!/usr/bin/perl
use strict;
use LWP::Simple;
open(L, "locations26.txt");
my $url = 'https://familysearch.org/search/collection/results#count=20&query=
%2Bevent_place_level_1%3A%22California%22%20%2Bevent_place_level_2%3A%22%LOCATION%
%22%20%2Bbirth_year%3A%YEAR%-%YEAR%~%20%2Bgender%3AM%20%2Brace
%3AWhite&collection_id=2000219';
open(O, ">out26.txt");
my $oldh = select(O);
$| = 1;
select($oldh);
while (my $location = <L>) {
chomp($location);
$location =~ s/ /+/g;
foreach my $year (1923..1923) {
my $u = $url;
$u =~ s/%LOCATION%/$location/;
$u =~ s/%YEAR%/$year/;
#print "$u\n";
my $content = get($u);
my $val = 'unknown';
if ($content =~ / of .strong.([0-9,]+)..strong. /) {
$val = $1;
}
$val =~ s/,//g;
$location =~ s/\+/ /g;
print "'$location',$year,$val\n";
print O "'$location',$year,$val\n";
}
}
Update: API is not a viable solution. I have been in contact with the site developer.
The API does not apply to that part of the webpage. Hence, any solution pertaining to
JSON will not be applicbale.
Source: http://stackoverflow.com/questions/14654288/using-perl-to-scrape-a-website
the number 1975: https://familysearch.org/search/collection/results#count=20&query=
%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego
%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace
%3AWhite&collection_id=2000219
That website is the amount of white men born in the year 1923 who live in San Diego
County, California in 1940. I am trying to do this in a loop structure to generalize
over multiple counties and birth years.
In the file, locations.txt, I put the list of counties, such as San Diego County.
The current code runs, but instead of the # 1975, it displays unknown. The number 1975
should be in $val\n.
I would very much appreciate any help!
#!/usr/bin/perl
use strict;
use LWP::Simple;
open(L, "locations26.txt");
my $url = 'https://familysearch.org/search/collection/results#count=20&query=
%2Bevent_place_level_1%3A%22California%22%20%2Bevent_place_level_2%3A%22%LOCATION%
%22%20%2Bbirth_year%3A%YEAR%-%YEAR%~%20%2Bgender%3AM%20%2Brace
%3AWhite&collection_id=2000219';
open(O, ">out26.txt");
my $oldh = select(O);
$| = 1;
select($oldh);
while (my $location = <L>) {
chomp($location);
$location =~ s/ /+/g;
foreach my $year (1923..1923) {
my $u = $url;
$u =~ s/%LOCATION%/$location/;
$u =~ s/%YEAR%/$year/;
#print "$u\n";
my $content = get($u);
my $val = 'unknown';
if ($content =~ / of .strong.([0-9,]+)..strong. /) {
$val = $1;
}
$val =~ s/,//g;
$location =~ s/\+/ /g;
print "'$location',$year,$val\n";
print O "'$location',$year,$val\n";
}
}
Update: API is not a viable solution. I have been in contact with the site developer.
The API does not apply to that part of the webpage. Hence, any solution pertaining to
JSON will not be applicbale.
Source: http://stackoverflow.com/questions/14654288/using-perl-to-scrape-a-website
No comments:
Post a Comment