PHP采集链接:相对链接转为绝对链接

2013年10月9日 发表评论 阅读评论

该采集链接是从Snoopy中提取出来的,也是一个很好的函数,可以根据URL是相对链接还是绝对链接采集到链接,如果是相对链接会根据相对链接和主域名,返回绝对链接,也支持不同端口。

<?php
/*===================================================================*
	Function:	_expandlinks
	Purpose:	expand each link into a fully qualified URL
	Input:		$links			the links to qualify
				$URI			the full URI to get the base from
	Output:		$expandedLinks	the expanded links
*===================================================================*/
function _expandlinks($links,$URI)
{
	$URI_PARTS = parse_url($URI);
	$host = $URI_PARTS["host"];
	preg_match("/^[^?]+/",$URI,$match);
	$match = preg_replace("|/[^/.]+.[^/.]+$|","",$match[0]);
	$match = preg_replace("|/$|","",$match);
	$match_part = parse_url($match);
	$match_root =
	$match_part["scheme"]."://".$match_part["host"];
	$search = array( 	"|^http://".preg_quote($host)."|i",
						"|^(/)|i",
						"|^(?!http://)(?!mailto:)|i",
						"|/./|",
						"|/[^/]+/../|"
					);
	$replace = array(	"",
						$match_root."/",
						$match."/",
						"/",
						"/"
					);
	$expandedLinks = preg_replace($search,$replace,$links);
	return $expandedLinks;
}
//以下是测试内容
$r = _expandlinks('asd/asd.html','http://www.361way.com/');
echo $r;
//output http://www.361way.com/asd/asd.html
echo '<br />';
$r = _expandlinks('http://www.361way.com/asd.html','http://www.361way.com/');
echo $r;
//output http://www.361way.com/asd.html
echo '<br />';
$r = _expandlinks('asd.html','http://www.361way.com:8080/');
echo $r;
//output http://www.361way.com:8080/asd.html
?>

经过测试,可以知道:第一个参数$links是链接的url
比较你采到网站中链接是<a href="asd.html">测试</a> 

主站域名是http://www.test.com/ 此函数会根据相对路径关系,反回绝对路径http://www.test.com/asd.html




本站的发展离不开您的资助,金额随意,欢迎来赏!

You can donate through PayPal.
My paypal id: itybku@139.com
Paypal page: https://www.paypal.me/361way

分类: perl/php/python/gawk/sed 标签:
  1. 本文目前尚无任何评论.
  1. 2013年10月9日15:48 | #1