Crawling for Authors:
Users are encouraged to submit content that they deem appropriate to the CiteSeerx collection. CiteSeerX crawls and indexes documents that are freely and publicly available on the Web.
If you do not want your documents crawled by CiteSeerx, please use a robots.txt to disallow our crawler named citeseerxbot.
We require that all content be submitted through links to publicly accessible documents on the Web. Please make sure you have provided relevant permissions and your robots.txt permits documents to be crawled by our bot citeseerxbot. Once we receive a link submission, that link will be queued for crawling and processed dynamically. Allow several weeks before the documents are indexed by CiteSeerx.
Crawling for Open-Access Publishers, Venues, and Repositories
Thank you for requesting that CiteSeerx crawl and index your papers. It is important to note that CiteSeerx indexes papers, not venues. Please insure the following:
- Make sure that your papers are in PDF. These documents should be freely available to anyone that visits your site without any login credentials. Currently, CiteSeerx only indexes papers written in English.
- In order for CiteSeer to crawl your site, please edit your robots.txt file to allow our crawler to download your papers. The content of robots.txt should be either
User-agent:*to allow any crawler to completely crawl your site or
User-agent: citeseerxbotThe latter will specifically allow our crawler, citeseerxbot, to crawl the files in /yourdirectory/.
- We prefer a list of URLs that directly link to your PDF files. You can also generate a sitemap which contains URLs of PDF files. This speeds up downloads.
- If you cannot provide direct links to the PDF files, then please provide a direct link URL to the archive(s) of all of your papers.
- Once crawled, your papers will then have metadata extracted, parsed, and imported into our database and then indexed. Please allow at least a week for your papers to appear in CiteSeerx. If you do not see your papers after two weeks, please contact us.
*Publishers policy on self-archiving of your publications.
Supported File Formats
- PDF: (Recommended) We are generally able to convert PDF documents in such a way as to preserve UTF-8 character codes. Therefore, we recommend submitting content in this format particulary if your files contain characters that cannot be correctly represented within the ASCII character set.
- PS: We do support PostScript files; however, text conversion will be limited to ASCII-only due to limitations in standard PostScript text extractors.
- ZIP | GZ | Z: Common compression formats such as zip, gzip, and UNIX compress are all supported.