HTMLドキュメントからリンクを抜き出す　HTML::LinkExtor｜プログラムメモ

Perlプログラムメモ

トップに戻る

tags

[2]

エラー対処

[1]

HTML::TreeBuilder

[1]

[3]

TemplateToolkit

[8]

[3]

perlモジュール

[5]

CGI::Application

[2]

[1]

[1]

[1]

[1]

[1]

Perlプログラムに関する各種メモ書き

HTMLドキュメントからリンクを抜き出す　HTML::LinkExtor

■ ダウンロード

HTML::Parser ( http://search.cpan.org/~gaas/HTML-Parser/ )

（HTML::LinkExtor日本語訳）

http://homepage3.nifty.com/hippo2000/perltips/html/LinkExtor.htm

(HTML::Parser日本語訳)

http://homepage3.nifty.com/hippo2000/perltips/html/Parser.htm

■ HTML::Parserの使いかた

http://www.geocities.co.jp/SiliconValley-Sunnyvale/6128/perl/htmlpaser.html

サンプルコード

#!/usr/local/bin/perl
$|=1;
use strict;
use lib qw(./extlib);

my $start_uri='http://www.yahoo.co.jp/';
my $base_uri=$start_uri;

require HTML::LinkExtor;
my  $p = HTML::LinkExtor->new(?&cb, $base_uri);
 sub cb {
     my($tag, %links) = @_;
     print "[tag:$tag] [link:@{[%links]}]?n";
 }

my $data=`curl $start_uri`;
 $p->parse($data);
 $p->eof

関連エントリー

No.66

06/04 15:33