Introduction of baidu spider
1. What is Baiduspider
Baiduspider is a automatic program of baidu search engine, its role is to visit the pages on the Internet, establish index database, the user can search in baidu search engine to your site on the web.
2. What is the user-agent of Baiduspider?
Baidu products use different user – agent:
The product name
Corresponding to the user-agent
Baiduspider – image
Baiduspider – video
Baiduspider – news
Baidu search to hide
Baiduspider – favo
Baiduspider – cpro
Baiduspider – ads
3. How is the visit to a website server caused by Baiduspider?
To achieve better search results for the target resource, Baiduspider needs to maintain a certain amount of fetching for your site. We try not to bring unreasonable burdens on the website, and we will adjust the quality of the website, the quality of the website, the website update and so on. If you feel that the visit of baiduspider has any unreasonable situation, you can feedback to the feedback center.
Why is Baiduspider constantly grabbing my website?
For new or continuously updated pages on your site, Baiduspider will continue to grab. In addition, you can check whether the visit to Baiduspider in the site’s access log is normal to prevent malicious impersonation of Baiduspider to grab your site frequently. If you find that Baiduspider is not normal to grab your website, please give us feedback from the feedback center, and please give me the access log of Baiduspider to your station, so that we can track and deal with it.
5. How to judge whether to impersonate Baiduspider?
It is recommended that you use the DNS anti-check method to determine whether the IP of the fetching source belongs to baidu. According to the different authentication methods of the platform, the verification methods of the Linux/Windows/OS platform are as follows:
5.1 in the Linux platform, you can use the host IP command to unsolve IP to determine whether it comes from the grab of Baiduspider. The hostname of Baiduspider is named in the format of *. Baidu.com or *.baidu.com or *.
22.214.171.124. The in – addr. ARPA domain name pointer
The host 126.96.36.199
254.195.63.119. The in – addr. ARPA domain name pointer
BaiduMobaider – 119-63-195-119. Crawl. Baidu. Jp.
5.2 in the Windows platform or IBM OS / 2 platform, you can use the nslookup IP command to determine whether the IP is from the Baiduspider crawl. When you open the command processor, enter nslookup XXX. XXX. XXX (IP address) can resolve the IP to determine whether it comes from the fetching of Baiduspider. The hostname of Baiduspider is named in the format of *. Baidu.com or *.baidu.com or *.
5.3 on the MAC OS platform, you can use the dig command to find out whether the IP is from the Baiduspider crawl. Open the command processor to enter dig XXX. XXX. XXX (IP address) can resolve IP to determine whether it comes from the scraping of Baiduspider, the hostname of Baiduspider is named after *. Baidu.com or *.baidu.com or *.
I don’t want my website to be visited by Baiduspider. What should I do?
Baiduspider abides by Internet robots protocol. You can use the robots.txt file to completely prohibit Baiduspider from accessing your site, or to prohibit Baiduspider from accessing some of the files on your site. Note: Baiduspider is prohibited from accessing your site, which will make your web page unsearchable in baidu search engine and all search engine services provided by baidu. For the writing method of robots.txt, please refer to our introduction: robots.txt writing method
You can set different fetching rules according to different user – agents of each product. If you want to completely ban all products from baidu, you can set up a ban on fetching of Baiduspider directly.
The following robots implementation prohibits all fetching from baidu:
The user-agent: Baiduspider
The following robots implementation prohibits all fetching from baidu but allows image search to capture/image/directory:
The user-agent: Baiduspider
The user-agent: Baiduspider – image
Allow/image / :
Please note: the web page of baiduspider-cpro does not enter the index, it only performs the operation with the client, so it does not comply with the robots protocol. If Baiduspider – cpro has caused the trouble to you, please contact email@example.com.
The web page of baiduspider-ads is not built into the index. It only performs the operation with the customer, so it does not comply with the robots protocol. If baiduspider-ads is causing you problems, please contact your customer service specialist.
Why does my website have a robots.txt, and can search it in baidu?
Because the search engine index database update takes time. Although Baiduspider has stopped accessing the web pages on your site, it may take months to clear the web pages indexed by baidu’s search engine database. Also check that your robots configuration is correct.
If your rejection is very urgent, it can also be addressed through the feedback center.
I want my website to be indexed by baidu but not saved. What should I do?
Baiduspider abides by the Internet meta-robots protocol. You can use the web page meta setting to make baidu display only index to the page, but not to display a snapshot of the page in the search results.
As well as the renewal of the robots, because the search engine index database update takes time, so while you are in a web page through the meta banned baidu in the search results showed that the snapshots of the web, but baidu search engine in the database if the page has been established the index information, may take two to four weeks will take effect in the online.
9. Bandwidth congestion caused by Baiduspider scraping?
The normal scraping of Baiduspider does not cause the bandwidth congestion of your website, which may be caused by someone posing as Baiduspider maliciously fetching.