In short, I have a large HTML document that I want to grab all the text from. Should be as easy as copying and pasting it, right? Well, not quite. There’s a bunch of tables and images interspersed throughout that I don’t want included.
So is there a way to grab all the text, sans images and other non-text things?
essell
April 28, 2007, 9:31am
2
In short, I have a large HTML document that I want to grab all the text from. Should be as easy as copying and pasting it, right? Well, not quite. There’s a bunch of tables and images interspersed throughout that I don’t want included.
So is there a way to grab all the text, sans images and other non-text things?
What are you pasting into?
Into word or similar?
Select all
Copy.
Then into Word…
Edit
Paste Special
Unformatted Text
Or paste the web page into notepad, save the file then copy and paste the text into whatever. Same deal really.
LSLGuy
April 28, 2007, 11:07am
3
Another way. View the html doc in a browser. Choose “Save As…”. For file type, choose “Text File”.
The file will include a bunch of tabs to crudely represent some formatting, but you can remove all that with a single find/replace. For example, this is the current page saved as text & pasted back into a CODE tag in this post:
Straight Dope Message Board - Is there a way to just copy the text from a webpage?
Straight Dope Message Board > General Questions
Is there a way to just copy the text from a webpage?
Welcome, LSLGuy.
You last visited: Yesterday at 10:39 PM
Private Messages: Unread 0, Total 0.
User CPRulesSubscribeFAQMembers ListCalendarNew PostsSearch Quick Links
Log Out
Search Forums
Advanced Search
Quick Links
New Posts
Mark Forums Read
Open Buddy List
User Control Panel
Edit Signature
Edit Profile
Edit Options
Miscellaneous
Private Messages
Subscribed Threads
My Profile
View First Unread Thread Tools Search this Thread Display Modes
Today, 04:12 AM #1
Red Barchetta
Member Join Date: May 2006
Posts: 656
Location: Bay Area, California
Is there a way to just copy the text from a webpage?
In short, I have a large HTML document that I want to grab all the text from.
Should be as easy as copying and pasting it, right? Well, not quite. There's a
bunch of tables and images interspersed throughout that I don't want included.
So is there a way to grab all the text, sans images and other non-text things?
Red Barchetta
View Public Profile
Send a private message to Red Barchetta
Send Email to Red Barchetta
Find all posts by Red Barchetta
Today, 04:31 AM #2
essell
Member Join Date: Sep 2006
Posts: 301
Location: UK
Quote:
Originally Posted by Red Barchetta
In short, I have a large HTML document that I want to grab all the text
from. Should be as easy as copying and pasting it, right? Well, not quite.
There's a bunch of tables and images interspersed throughout that I don't
want included.
So is there a way to grab all the text, sans images and other non-text
things?
What are you pasting into?
Into word or similar?
Select all
Copy.
Then into Word..
Edit
Paste Special
Unformatted Text
Or paste the web page into notepad, save the file then copy and paste the text
into whatever. Same deal really.
essell
View Public Profile
Send a private message to essell
Send Email to essell
Find all posts by essell
Advertisements
« Previous Thread | Next Thread »
Quick Reply
Message:
Options
Show your signature Quote message in reply?
Thread Tools
Show Printable Version
Email this Page
Subscribe to this Thread
Display Modes
Linear Mode
Switch to Hybrid Mode
Switch to Threaded Mode
Search this Thread
Advanced Search
Rate This Thread
You have already rated this thread
Posting Rules
You may post new threads
You may post replies
You may not post attachments
You may edit your posts
vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump
Please select one User Control Panel Private Messages
Subscriptions Who's Online Search Forums Forums Home
-------------------- About This Message Board Comments on
Cecil's Columns Comments on Staff Reports General Questions
Great Debates Cafe Society In My Humble Opinion Mundane
Pointless Stuff I Must Share (MPSIMS) The BBQ Pit
All times are GMT -5. The time now is 06:05 AM.
Contact Us - Straight Dope Homepage - Archive - Top
Powered by: vBulletin Version 3.0.7
Copyright ©2000 - 2007, Jelsoft Enterprises Ltd.
The Straight Dope / Questions or comments for Cecil Adams to:
cecil@chicagoreader.com
Comments regarding this website to: webmaster@straightdope.com
For advertising information, see the Chicago Reader Online Rate Sheet
"The Straight Dope by Cecil Adams" is a registered trademark of Chicago Reader,
Inc. Contents of the Straight Dope Message Board and the Straight Dope Web site
are copyright 1984-2007 by the Chicago Reader, Inc. All rights reserved. By
posting on this board you grant the Chicago Reader, Inc., and its successors and
assigns a nonexclusive irrevocable right to re-use your posting in any manner it
or they see fit without notice or compensation to you. No material contained in
this site may be republished or reposted without express written consent of the
Chicago Reader, Inc., except that message board users retain the right to
republish or repost their own work.
m1k3g
April 28, 2007, 12:20pm
4
There’s a very handy free utility you can download called ‘Pure Text’
http://www.stevemiller.net/puretext/
Here’s the description from the site.
“Have you ever copied some text from a web page or a document and then wanted to paste it as simple text into another application without getting all the formatting from the original source? PureText makes this simple by adding a new Windows hot-key (default is WINDOWS+V) that allows you to paste text to any application without formatting.”
Very handy tool IMO.