What exactly is the arrangement between search engines and news providers?

Hi
What exactly is the arrangement between search engines and news providers regarding the distribution of news content? I remember reading of an agreement between the New York Times and Google some years ago on this very topic.

As of now, if you haven’t subscribed to the New Yorker magazine, you will not be able to access their articles. Does the New Yorker make agreements with individual search engines not to distribute their content to non-subscribers? What if subscribers were to copy content and make it available on blogs, would search engines block it? I look forward to your feedback.
davidmich

Any site can set up a robots.txt file on their site:

Copyright holders can request that Google and other search engines remove copyrighted materials from the search results–and Google does this tens of millions of times a year
http://www.google.com/transparencyreport/removals/

The crawlers can only find information on publicly available web pages. If there’s a log in, they have no access.

The robot.txt files are only for public pages.

Now, if you copied text from a protected site and put it on your blog, Google would find it. I suspect that would be against the terms of service of the original site (and maybe even the blog), so you’d lose your access if it’s egregious enough.

Whatever the arrangement is, it seems that some sites have permission to use articles from, say, the New Yorker, even if you normally have to have a subscription in order to access such an article. For example, Arts & Letters Daily frequently does this.

So that’s how “gating” news content is done. I had never heard of the robots.txt file. before. Thank you all very much.
davidmcih

Would that it were only the legitimate copyright holders making such requests. Anyone can request Google stop linking to anything and there are apparently no repercussions, beyond Google ignoring the most egregious asshats.

Case in point, taken directly from the requests, which Google makes public:

Here’s the whole thing.