is it true that SEC filing documents are parsed manually?

code_grey · July 8, 2011, 3:52pm

quoth http://groups.google.com/group/get-theinfo/browse_thread/thread/765e12ff2cbeaf13?pli=1

Most EDGAR docs (but not all) are available in a very poorly adhered to mark up language - no OCR required, but you do need to apply many heuristics to determine the intent of whoever wrote the document. For example, from memory, table columns are defined with a <C> tag, but this tag simply indicates the tab stop for the column delimits, not an XML style tag. So once you have found a table with columns, you need to do some guess work to determine what is a header, versus units, versus actual column content.
…
Every financial firm that I know of just pays boatloads of cash to one of Retuers, Bloomberg, etc., who have armies of ‘encoders’ who manually enter in the data into a normalized format. And even then, if you pony up the minimum of ~$10k/month for these feeds, you get to deal with the joy of an entirely different set of problems that you don’t ever get to scratch the surface of when you write your own parser (because you never get far enough along with solving that problem). The second order problems come from company restatements, changing accounting standards, changing reporting periods, etc.

can anybody comment or elaborate on these claims about the market for these parsing services or the “state of the art” mechanics of parsing the filing documents?

Do they really use lots of data entry people? Have they tried building visual, human-controlled productivity tools that would grab table contents regardless of the ad hoc html formatting issue?

Topic		Replies	Views
PDF / .DOC => CSV parsing conversion - cheaply, efficiently, accurately Marketplace	1	5613	June 3, 2011
Paper-to-computer file conversion advice wanted In My Humble Opinion	9	1722	July 16, 2009
is there an unmet but hidden need for parsing text out of pdf files out there? Factual Questions	16	2076	March 25, 2011
Have you ever hired someone to format your Word docs? How'd that go? In My Humble Opinion	21	1627	June 8, 2007
Does every tax return get looked at by an actual person? Factual Questions	14	2661	February 12, 2009

is it true that SEC filing documents are parsed manually?

Related topics