This isn’t going to be of any help to you, but I just had to say that the VR I have on my Android phone regularly creates some of the funniest statements I’ve ever read (and not said).
Version 11 allows you to load in an audio file, but my test with a phone interview delivered amusing results. It did the local side just fine, but the frequency limited phone side had considerably worse accuracy. I wound up parroting the whole thing. It took a while to produce an accurate transcription, but still far faster than typing. I’m sure the audio file transcribing will work fine if it was something like a speech recorded via a local microphone.
Siri is absolutely the wrong choice. It uses Nuance Dragon technology but is not suited to what you are trying to do.
Dragon is by far the best choice but the bad news is it is very good for the local side but not for anything over the telephone. Speech recognition uses voice models as part of determining what was said. The voice model is trained using audio samples that are similar to the end use environment and spoken by people with the same general speech characteristics (this is why speech recognition setup for a US English speaker will not perform well with say an Italian English speaker). In the case of Dragon that training audio is recorded using microphones at 16KHz sampling rates. Telephone speech recognition needs a different voice model at 8KHz and recorded using telephones (telephones adopted this 8KHz to save bandwidth). A voice model trained with microphone audio will not do as well at recognizing audio from another source. I am not even sure if Dragon has a telephone voice model.
So Dragon is your best hope but like a previous poster said, you will probably have to dictate the response from the telephone end of the conversation.
One other thing, Dragon is a dictation system, it uses a Language Model (millions of pieces of text that are processed to produce a statistical representation of what order is likely for a series of words). Conversational speech is not the same as dictating a document. You may initially have some accuracy problems but a little training of Dragon will overcome that issue.
In the test I did, I was trying to transcribe a Canadian radio host interviewing Kate Bush who was on the phone. Did OK with him, but was a holy mess with Kate.
I wish there was a text recognition program that would pay attention to my parroting and train to recognize what I parroted, allowing it to learn a new voice or recording medium based on the user supplying the plaintext.
If Dragon is the best, lord help us all. I found it to be useless after six months of trying to get it to recognize and type anything even close. And I had the 11.0 Professional edition and it was barely better than the one that comes with Windows
Seriously? Because I use it all the time, especially when transcribing interviews. I correct it and it learns how I pronounce a particular word and doesn’t make that mistake again.
Do you have a particularly thick accent, or are you using it in a very noisy environment?
I have to bump this, if only to see if anyone else has had a positive experience like mine, or a negative one like BeaMyra. Because I have been transcribing interviews recently, and it does an amazing job. All I can figure is that the people who write this software speak as I do, like an archetypical “computer geek”.