I need to search my company’s website for a particular image, to see how many times it occurs throughout the site.
Because the file has a unique name, I thought, ‘easy! I’ll just type the file name into Google.’ But it looks like Google just searches the text of an html page, and doesn’t include html.
Can anyone suggest how to find how many times an image occurs on a website? Perhaps a search engine that doesn’t filter out html code?
(Telneting in, and searching directly on the unix server is not an option, btw.)
Have you got (or can you get) the ability to FTP into the server? Then it would be a simple matter to check each .html or .htm document manually, or you could automate it with a PHP or Perl script.
Probably too late for this idea to work for you now, but I always keep a complete mirror-image copy of all data for every web site on a folder in my local PC or network, down to the exact directory structure. Then I can use all the standard search tools from a PC, including some dynamite, lightning-fast DOS stuff.
I can’t imagine doing it any other way. In your case, would FTP-ing all data down to your local PC be an option? (Just in case PHP or perl isn’t in timgregory’s vocabulary, Q.E.D.)
Google has an image search feature. Go to their main page and click the “Images” tab or go here. It appears to work based on filename and the advanced search allows you to specify a domain. It didn’t work on a quick search of an image name on one of my domains, so either they haven’t indexed me or it’s not really using filename and relying on alt tags or context.
Another option would be to use an offline browser like WebWhacker (commercial, but there are probably free alternatives with similar functionality). This might be easier if you don’t have FTP access to download the files since this kind of utility will spider a site and create a local copy for you. You could then search that local copy using a variety of tools.
FTPing is not an option, sadly. For security reasons, users on my level don’t have direct access to any of the web servers – we access the pages through a file management tool (one without a search tool).
Keeping local copies through this tool would be cumbersome at best. Plus I just started this job, so I haven’t touched many of the pages yet!
Google’s image search didn’t work, either – although google can indeed find our pages (if i type in a phrase from any page).
I’ll check out WebWhacker and any similar (free) tools. Thanks!
In the mean time, if anyone else can think of a decent work around to my lame server access, I’d appreciate it! Thanks.
Search for webstripper, this will download the whole website based off the linked pages on the domain, you can use those files to see how many time the image appears.
call the web person and ask either for the answer or access to a copy for the answer.
Write a perl script to spider through the pages and search for the key word you’re looking for. check out lwp-rget for a start.
Download the site, and grep through it. black widow is a good tool.
Note that for options #2 and #3, you’re relying on the site being easily spiderable. So if it has flash menus or complex javascript or whatever, it won’t work.