Can this be done in HTML? (text manipulation)

Here is my problem. Please forgive the length of the explanation, and feel free to ask questions if anything needs to be clarified.

I import long strings of text into Excel. I use a spreadsheet with HTML tags interspersed throughout it to serve as a sort of template. The long text strings are embedded in between the HTML tags. The only format in which I can save the file that is eventually converted to HTML is the .prn format. All the other plain-text formats Excel has use tabs, quotation marks or other foreign characters that would screw up the HTML coding. Once I have my .prn file I use a small custom-written applet to strip out the extra spaces in the .prn file, and then save the output as an HTML file.

This system has worked well for me, but one of the limitations I face with using .prn is that it cuts off the long text strings and appends the remainder of these strings at the bottom of the file. These remainder strings are cut off when the final HMTL file is generated. What I have to do is manually reinsert the strings of text that were originally truncated into the HTML files.

I had experimented with developing complex formulas in Excel that would divide a long string of text into separate lines. It would even make the breaks in between whole words so that a word wouldn’t be cut off in the middle. Usually, however, I would get errors and I could not figure out what caused them when all I was doing was having the next line start at a certain point in the text string following the occurrence of a space, which marked where the next whole word would start.

I gave up on this after much trial and error and decided to just break the text into strings of the same length, regardless of where the next word began. This introduces the problem of having words split in the middle. If it could be done in HTML, I would like be able to join several strings together and have them appear as a single body of text, but as far as I know there is no way this can be done in HTML.

Here is a simplified example to help illustrate my problem.

Let’s say the text string is the Pledge of Allegiance. (the actual text strings can be much longer than this, but remember this is simplified for the example). I want to divide the text string into 40-character segments, which I know how to do in Excel. The lines would appear as follows:



I Pledge Allegiance to the flag of the U
nited States of America and to the Repub
lic for which it stands, one Nation unde
r God, indivisible, with liberty and jus
tice for all.


What I want to know is, is there a way in HTML that I could string each of these segments together so that they would appear as one contigious string? The ways I have tried it put spaces where the line breaks occur.

Do you have access to any server side scripting? ASP or Cold Fusion perhaps?

I understood you correctly you have some data in excel like this

cell a1: {BODY}
cell a2: my text goes here
cell a3: {B}
cell a4: my bolded text goes here

… or …

cell a1: {BODY}
cell b1: my text goes here
cell a2: {B}
cell b2: my bolded text goes here

and so on. Then you save it as prn.

A HTML file is just a text file and you can only do cut & paste with it. You could of course load the text string strings into a javascript array, but frankly that’s to much trouble.

Am not familiar with the prn format, but it appears to me that you could export the data (instead of using Save As), or use one of the other formats which inserts a strange symbol that wont appear anywhere in your text, then just strip the text file of the strange symbol.

I could also suggest that you rather create a Word template with two columns, the left for html code, and the right for the text strings (cells just like in excel). When you save that file as plain text, the cell simply disappears. Or get an editor. Honestly. excel is not a good tool to use for html.

No, I don’t believe you could do what you want with just html. You might be able to put something together in javascript, but I wouldn’t recommend it since that would mean building the pages with the extra spaces and then depending on the browser to put it together properly client-side. This would be exceedingly ugly and unreliable.

I agree with Alien that Excel it probably a poor tool for the task, I would tend to write a perl script, but you have what you have.

If you just let the .prn export have its way with your data, how wide are the resulting chunks? It appears the max width is 255, if you’re getting less than that, check the column width.

What happens if you save to .txt rather than .prn? It seems to produce longer lines for me, but that’s just in my 30 second test…

You could try writing your own export function–I would assume that will give you more control.

No way that I know of to compress the whitespace like that.

Out of curiosity, though, what prevents you from using another Excel format? Taba and quotation marks shouldn’t really be screwing up HTML at all. (I’m not familiar with what you’re doing, though, so I admit that there could be other characters that I don’t know about that prevent you from using those other formats.)

Stupid question: Since you’re running the Excel output through an applet anyways, have you tried the applet filtering out the tabs and commas from a different export format? Alternately the applet could stitch the strings back together. I don’t have a copy of Excel to see just what the prn format looks like. Basically, since there’s java in the process anyways, I think you should use that instead of adding another step.

Hal

Thanks for the responses, everyone.

tastycorn- I do have access to server-side scripting, though something of this sort is beyond my expertise. The guy who hosts my site may be able to help me if I cannot develop any solutions to remedy my problem on my end.

Alien- My HTML template is similar to what you described. Most of the data goes into a table like this:



                  A                                       B
  1<table border=1><tr><td><b>             Question 1</b>
  2<td><font color=blue>                   =[survey_data.xls]data!$a$1
  3<tr><td><b>                             Question 2</b>
  4<td><font color=blue>                   =[survey_data.xls]data!$a$2
  5<tr><td><b>                             Question 3</b>
  6<td><font color=blue>                   =[survey_data.xls]data!$a$3
  7</table>


This is then saved as a .prn file. The data in the formula is where the text strings appear. If these strings are too long they run off the edge and the remainder appears at the bottom. After I run the .prn file through the applet that strips the spaces out the file will look somewhat like this:



<table border=1><tr><td><b>Question 1</b>
<td><font color=blue>I Pledge Allegiance to the flag of the U
<tr><td><b>Question 2</b>
<td><font color=blue>I Pledge Allegiance to the flag of the U
<tr><td><b>Question 3</b>
<td><font color=blue>I Pledge Allegiance to the flag of the U
</table>

nited States of America and to the Repub
lic for which it stands, one Nation unde
r God, indivisible, with liberty and jus
tice for all.

nited States of America and to the Repub
lic for which it stands, one Nation unde
r God, indivisible, with liberty and jus
tice for all.

nited States of America and to the Repub
lic for which it stands, one Nation unde
r God, indivisible, with liberty and jus
tice for all.


The .prn format will save an Excel file so that all the columns are aligned. This alignment is created by padding out the unused characters in each cell with spaces (see first coded example above). Excel will export directly to HTML, but this creates such large files with extraneous HTML code and all I need is a few simple HTML tags (for building tables).

If Word can refer to data from other cells I might consider going this route. I don’t want to have to copy and paste everything into the cells, though. Even though Excel isn’t really meant for HTML editing, I’ve figured how to use it this way for my purposes and it does 95% of the job on its own. I just need to figure out a way to get around the limitations I am up against, either by using Excel or by using HTML tags that will join the string segments without white space in between them.

zoltar7 and Hauky- I am not well versed in Java, so that option is out. Ditto for Perl. Maybe I should just sit down and take the time to familiarize myself with these things.

The .prn does have a maximum width of 256 characters. If I save to .txt the data is bounded by quotation marks, which I would have to manually find-and-replace to strip out. The custom applet does not work this way and I have lost contact with the person who was kind enough to write it for me. While the tab characters aren’t a problem, the option that allows me to save with tab delimiters still puts in those pesky quotation marks around each cell.

Thanks for the link, zoltar7. I will look at it and see if maybe this presents the solution I am looking for.

Winsling- The applet was custom-written. If I can find the email address person who originally wrote it I might ask him to add additional functionality to it.

Ok, correct me if I’m wrong, but the scenario is this. You have along list of structurally similar data (pledges :)) that you like to work with in excel, but need to display on a web site.

You don’t like working in visual HTML editors, possibly because you need to process the data while in Excel and importing/exporting or copy and pasting would be cumbersome.

So, the challenge is to get the data from Excel, along with the surrounding formatting information into HTML with the minimum of effort?

Given this problem, the way I would proceed is a follows: (kind of technical, but if you think it would work, I can provide detail)

Define the data you would like to export as a named range in Excel.
Set up a data source on the host computer to point to the Spreadsheet. (makes the sheet act like a database)
Write an ASP template that takes the three cells on each row (formatting, content, formatting) and simply outputs them.<%formatting_begin%><%content%><%formatting_end%><br>
Loop this so it repeats for each row.
Test each row so you know when it runs out of data.

To update the html page, simply export the entire Excel file to the web server and overwrite the old one.

This process bypasses any need for export, and should completely fix the text truncation and space issues.

Now, if I were writing this myself, I would use Cold Fusion to open a COM instance for Excel, open the file, select the sheet, read the data from each cell, insert into a database or write a html file, save the file, close the Excel file, kill the COM. Not everyone likes this method though.

Here’s a quick and (fairly, I hope) simple solution:

Create a new text file with a “.VBS” extension. Paste the following code into it and save it:



'<script language="VBScript">
' Don't change this constant
Const XLCONN = "DRIVER=Microsoft Excel Driver (*.xls);UID=admin;UserCommitSync=Yes;Threads=3;SafeTransactions=0;ReadOnly=1;PageTimeout=5;MaxScanRows=8;MaxBufferSize=2048;FIL=excel 8.0;DriverId=790"

' Path to directory containing XLS workbook
Const XLCONN_DEFAULTDIR = "F:\Documents and Settings\armilla\My Documents"
' Name of Excel File
Const XLCONN_DBQ="Book1.xls"
' Path to XSL File
Const XSLFILE = "F:\Documents and Settings\armilla\My Documents\dumpXLRS.xsl"
' Path to output file
Const OUTPUTFILE = "c:	est.htm"
' Name of range to select from XLS file
Const XLRANGE = "Alpha"

Dim strConn, objConn, objRS, objFS, objTS, objField, i, strLine
Dim objXML, objXSL

strConn = XLCONN & ";DefaultDir=" & XLCONN_DEFAULTDIR & ";DBQ=" & XLCONN_DEFAULTDIR & "\" & XLCONN_DBQ
Set objConn = CreateObject("ADODB.Connection")
objConn.Open strConn
Set objRS = objConn.Execute("SELECT * from " & XLRANGE)

Set objFS = CreateObject("Scripting.FileSystemObject")
Set objTS = objFS.OpenTextFile(OUTPUTFILE,2,true)

Set objXML = CreateObject("MSXML2.DOMDocument.3.0")
Set objXSL = CreateObject("MSXML2.DOMDocument.3.0")

objRS.Save objXML,1

objXSL.async=False
objXSL.load XSLFILE
strLine = objXML.transformNode(objXSL)
objTS.WriteLine strLine

objRS.Close
Set objRS = Nothing
objTS.Close
Set objTS = Nothing
objConn.Close
Set objConn = Nothing
Set objFS = Nothing

WScript.Echo "Done"


Now create another new text file with an “.XSL” extension. Paste the following into it:



<xsl:stylesheet 
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
	xmlns:msxsl="urn:schemas-microsoft-com:xslt" 
	xmlns:s='uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882'
	xmlns:dt='uuid:C2F41010-65B3-11d1-A29F-00AA00C14882'
	xmlns:rs='urn:schemas-microsoft-com:rowset'
	xmlns:z='#RowsetSchema'
	version="1.0">

	<xsl:output method="text"/> 

	<xsl:template match="z:row">
		<xsl:apply-templates select="@*" mode="row"/>
	</xsl:template>
	
	<xsl:template match="@*" mode="row">
		<xsl:value-of select="."/>
	</xsl:template>

</xsl:stylesheet>


Now, open your spreadsheet and, as tastycorn suggested create a named range around the data in your spreadsheet (select the cells, maybe with a bit extra for expansion needs, then go to Insert/Name/Define on the menu bar). I called my range “Alpha”.

Now, open the VBS file in an editor like Notepad. You’ll need to edit the constants defined at the top of the file to point to the right places for the XLS file and the XSL file. You’ll also need to make sure the name of the range is correct.

Finally, run the VBS file by double-clicking on it. It should open the spreadsheet, extract your data as XML and then transform it into an HTML file suitable for your use.

Problems you might encounter:
You may not have the right versions of MSMDAC or MSXML on your system. Download the MDAC2.7 sp1 from here and the MSXML4 release from here.

Hmm. Looks a little more complex than I’d hoped. Feel free to email me if you need any help.

On submission I notice that the board’s splitting one of the lines in the VBS file. Make sure that everything from “Const XLCONN =” to “DriverId=790"” is all on the same line. This line is right at the top of the file.