Is there a way to just copy the text from a webpage?

In short, I have a large HTML document that I want to grab all the text from. Should be as easy as copying and pasting it, right? Well, not quite. There’s a bunch of tables and images interspersed throughout that I don’t want included.

So is there a way to grab all the text, sans images and other non-text things?

What are you pasting into?
Into word or similar?

Select all
Copy.

Then into Word…

Edit
Paste Special
Unformatted Text

Or paste the web page into notepad, save the file then copy and paste the text into whatever. Same deal really.

Another way. View the html doc in a browser. Choose “Save As…”. For file type, choose “Text File”.

The file will include a bunch of tabs to crudely represent some formatting, but you can remove all that with a single find/replace. For example, this is the current page saved as text & pasted back into a CODE tag in this post:


Straight Dope Message Board - Is there a way to just copy the text from a webpage? 
       
       
             Straight Dope Message Board > General Questions 
             Is there a way to just copy the text from a webpage? 
      Welcome, LSLGuy.
      You last visited: Yesterday at 10:39 PM 
      Private Messages: Unread 0, Total 0. 

      User CPRulesSubscribeFAQMembers ListCalendarNew PostsSearch Quick Links 
      Log Out

      Search Forums
         

      Advanced Search
      Quick Links
      New Posts
      Mark Forums Read
      Open Buddy List
      User Control Panel
      Edit Signature
      Edit Profile
      Edit Options
      Miscellaneous
      Private Messages
      Subscribed Threads
      My Profile

       View First Unread   Thread Tools  Search this Thread  Display Modes  

       Today, 04:12 AM   #1  

      Red Barchetta  
      Member Join Date: May 2006
      Posts: 656
      Location: Bay Area, California 

Is there a way to just copy the text from a webpage? 

In short, I have a large HTML document that I want to grab all the text from. 
Should be as easy as copying and pasting it, right? Well, not quite. There's a 
bunch of tables and images interspersed throughout that I don't want included.

So is there a way to grab all the text, sans images and other non-text things?
  
      Red Barchetta
      View Public Profile
      Send a private message to Red Barchetta
      Send Email to Red Barchetta
      Find all posts by Red Barchetta

       Today, 04:31 AM   #2  

      essell  
      Member Join Date: Sep 2006
      Posts: 301
      Location: UK 

Quote:
      Originally Posted by Red Barchetta
      In short, I have a large HTML document that I want to grab all the text 
      from. Should be as easy as copying and pasting it, right? Well, not quite. 
      There's a bunch of tables and images interspersed throughout that I don't 
      want included.

      So is there a way to grab all the text, sans images and other non-text 
      things?

What are you pasting into?
Into word or similar?

Select all
Copy.

Then into Word..

Edit
Paste Special
Unformatted Text

Or paste the web page into notepad, save the file then copy and paste the text 
into whatever. Same deal really.
  
      essell
      View Public Profile
      Send a private message to essell
      Send Email to essell
      Find all posts by essell

      Advertisements

« Previous Thread | Next Thread » 
        Quick Reply 
            Message:
             
            Options
            Show your signature Quote message in reply?  

               

      Thread Tools
       Show Printable Version
       Email this Page
       Subscribe to this Thread 
      Display Modes
       Linear Mode
       Switch to Hybrid Mode
       Switch to Threaded Mode
      Search this Thread
            

      Advanced Search
      Rate This Thread
      You have already rated this thread

             Posting Rules 
            You may post new threads
            You may post replies
            You may not post attachments
            You may edit your posts

            vB code is On
            Smilies are On
            [IMG] code is Off
            HTML code is Off
              Forum Jump
                Please select one User Control Panel Private Messages 
                Subscriptions Who's Online Search Forums Forums Home 
                --------------------   About This Message Board Comments on 
                Cecil's Columns Comments on Staff Reports General Questions 
                Great Debates Cafe Society In My Humble Opinion Mundane 
                Pointless Stuff I Must Share (MPSIMS) The BBQ Pit   

All times are GMT -5. The time now is 06:05 AM.
      Contact Us - Straight Dope Homepage - Archive - Top 
Powered by: vBulletin Version 3.0.7
Copyright ©2000 - 2007, Jelsoft Enterprises Ltd. 
The Straight Dope / Questions or comments for Cecil Adams to: 
cecil@chicagoreader.com
Comments regarding this website to: webmaster@straightdope.com
For advertising information, see the Chicago Reader Online Rate Sheet
"The Straight Dope by Cecil Adams" is a registered trademark of Chicago Reader, 
Inc. Contents of the Straight Dope Message Board and the Straight Dope Web site 
are copyright 1984-2007 by the Chicago Reader, Inc. All rights reserved. By 
posting on this board you grant the Chicago Reader, Inc., and its successors and 
assigns a nonexclusive irrevocable right to re-use your posting in any manner it 
or they see fit without notice or compensation to you. No material contained in 
this site may be republished or reposted without express written consent of the 
Chicago Reader, Inc., except that message board users retain the right to 
republish or repost their own work.

There’s a very handy free utility you can download called ‘Pure Text’

http://www.stevemiller.net/puretext/

Here’s the description from the site.

“Have you ever copied some text from a web page or a document and then wanted to paste it as simple text into another application without getting all the formatting from the original source? PureText makes this simple by adding a new Windows hot-key (default is WINDOWS+V) that allows you to paste text to any application without formatting.”

Very handy tool IMO.