portreserve引发的血案

下午让sean做一个kerberos的master-slave 备份的时候出现了一个问题。 754这个端口怎么都起不来。

1
kpropd -S  -d 

这个默认开启的就是754端口,但是每次都是提示端口已经被使用。于是用netstat和lsof来检查到底是什么程序占用了。还用了nmap来扫描呢。

1
netstat -an | grep 754 lsof -i :754 

均没有什么发现。这就奇怪了。 于是就用strace来进行跟踪,发现前面一直都是很正常的,都已经到SO_REUSEADDR了,理论上应该马上就可以建立连接了。可到了下一步就直接进行了报错。

1
strace kpropd -S  -d ...... open("/etc/services", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=641020, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb66212c000 read(3, "# /etc/services:n# $Id: services,v 1.48 2009/11/11 14:32:31 ovasik Exp $n#n# Network services, Internet stylen# IANA services version: last updated 2009-11-10n#n# Note that it is presently the policy of IANA to assign a single well-knownn# port number for both TCP and UDP; hence, most entries here have two entriesn# even if the protocol doesn't support UDP operations.n# Updated from RFC 1700, \`\`Assigned Numbers'' (October 1994). Not all portsn# are included, only the more common ones.n#n# The latest IANA port assignments can be gotten fromn# http://www.iana.org/assignments/port-numbersn# The Well Known Ports are those from 0 through 1023.n# The Registered Ports are those from 1024 through 49151n# The Dynamic and/or Private Ports are those from 49152 through 65535n#n# Each line describes one service, and is of the form:n#n# service-name port/protocol \[aliases ...\] \[# comment\]nntcpmux 1/tcp # TCP port service multiplexerntcpmux 1/udp # TCP port service multiplexernrje 5/tcp # Remote Job Entrynrje 5/udp .... # Remote Job S", 4096) = 4096 read(3, "ervicenfinger 79/tcpnfinger 79/udpnhttp 80/tcp www www-http # WorldWideWeb HTTPnhttp 80/udp www www-http # HyperText Transfer Protocolnhttp 80/sctp # HyperText Transfer Protocolnkerberos 88/tcp kerberos5 krb5 # Kerberos v5nkerberos 88/udp kerberos5 krb5 # Kerberos v5nsupdup 95/tcpnsupdup .... ", 4096) = 4096 read(3, " 209/udp # Quick Mail Transfer Protocolnz39.50 210/tcp z3950 wais # NISO Z39.50 databasenz39.50 210/udp z3950 waisnipx 213/tcp # IPXnipx 213/udpnimap3 220/tcp # Interactive Mail Accessnimap3 220/udp # Protocol v3nlink 245/tcp ttylinknlink .... 674/udpnh", 4096) = 4096 read(3, "a-cluster 694/tcp # Heartbeat HA-clusternha-cluster 694/udp # Heartbeat HA-clusternkerberos-adm 749/tcp # Kerberos \`kadmin' (v5)nkerberos-adm 749/udp # kerberos administrationnkerberos-iv ... ", 4096) = 4096 read(3, " 1494/tcp # Citrix ICA Clientnica 1494/udp # Citrix ICA Clientnwins 1512/tcp # Microsoft's Windows Internet Name Servicenwins 1512/udp # Microsoft's Windows Internet Name Serviceningreslock 1524/tcpningreslock 1524/udpnprospero-np 1525/tcp orasrv # Prospero non-privileged/oraclenprospero-np 1525/udp orasrvndatametrics 1645/tcp old-radius sightline # datametrics / old radius entryndatametrics .... 2", 4096) = 4096 read(3, "603/udp # Service Meternnsc-ccs 2604/tcp ospfd # NSC CCSnnsc-ccs 2604/udp # NSC CCSnnsc-posa 2605/tcp bgpd # NSC POSAnnsc-posa 2605/udp # NSC POSAnnetmon 2606/tcp ospf6d # Dell Netmonnnetmon 2606/udp # Dell Netmonndict 2628/tcp # RFC 2229ndict 2628/udp # RFC 2229ncorbaloc ... ", 4096) = 4096 read(3, " # BPRD (VERITAS NetBackup)nbprd 13720/udp # BPRD (VERITAS NetBackup)nbpdbm 13721/tcp # BPDBM (VERITAS NetBackup)nbpdbm 13721/udp # BPDBM (VERITAS NetBackup)nbpjava-msvc 13722/tcp # BP Java MSVC Protocolnbpjava-msvc 13722/udp # BP Java MSVC Protocolnvnetd 13724/tcp # Veritas Network Utilitynvnetd .... 5", 4096) = 4096 close(3) = 0 munmap(0x7fb66212c000, 4096) = 0 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, \[1\], 4) = 0 bind(3, {sa_family=AF_INET, sin_port=htons(754), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use) open("/usr/share/locale/locale.alias", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG|0644, st_size=2512, ...}) = 0 ...... 

出现这个状态,那就只有一个可能了,只能是在内核中直接被毙了,而不是其他的user进程导致的结果。 所以很多说什么端口被占用后没有被回收,可我也打开了回收和重用(tcp_tw_recycle和tcp_tw_reuse)了,也不起任何作用。 而什么被xinetd占用,ipv6这些更是方向不对了。 于是做了几个排错步骤,在/etc/services 中将对应的754端口改为了755,然后进行启动,发现可以正常启动。 而通过应用程序指定755端口也没有问题。 最后还是大神一句话解决战斗,关闭portreserve服务,这个是centos6才有的服务,之前一直都没有,而我们的tunning脚本都是基于5改的,我们对于6里面到底默认开启了哪些服务,并且是做什么用的,其实都并不了解。 而这个portreserve服务其实默认是不安装的,应该是在安装 看了下这个服务是要读取/etc/portreserve 目录下的端口定义的,并且进行保留。 而kdc在默认安装的时候会写入到这个目录中3个文件,其中就包含了754端口。 稍微看了下这个portreserve服务, 你一旦启动它就会建立一个socket在/var/run/portreserve/socket 中。而由于它并不调用listen()方法进行监听,所以我们才无法通过netstat和lsof来进行查找。 稍微记录下以备查用。