I'm using a RichTextBox control in a Windows Service to convert RTF to plain text. This method is actually mentioned by MS here: http://msdn.microsoft.com/en-us/library/cc488002.aspx
My Windows Service spawns multiple threads (typically 2 x the number of CPU cores), and each of these threads ends up instantiating a separate instance of the RichTextBox control and using it to convert RTF to plain text.
This seems to work great, except for when the service is run on machines with many cores. (4+). In these scenarios, the service will occasionally just lock up. The CPU spins at roughly 10%, and nothing happens. This will go on forever unless you kill the process. I finally managed to attach a debugger while it was happening, and it turns out it is something the RichTextBox control is doing. Looks like some internal lock that's placed when creating a new window handle.
I'm making sure that I dispose of the RichTextBox after every use. This doesn't seem to happen if I force the service to use fewer threads, but this dramatically reduces the throughput of my service on multi-core machines.
Anybody have any ideas on how to fix this, or any idea on a better way to convert RTF to plain text?
Here is the relevant portion of thestack trace of one of the threads that's stuck:
[In a sleep, wait, or join] >System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.CreateHandle(System.Windows.Forms.CreateParams cp) Line 702 + 0x24 bytesC# System.Windows.Forms.dll!System.Windows.Forms.Control.CreateHandle() Line 5632C# System.Windows.Forms.dll!System.Windows.Forms.TextBoxBase.CreateHandle() Line 1478C# System.Windows.Forms.dll!System.Windows.Forms.RichTextBox.Rtf.set(string value) Line 759C# | | RMD Tuesday, September 29, 2009 8:36 PM | You could write your own parser. It shouldn't be too hard to make a regex to remove matching curly braces from the text, and one to remove everything after a '\' until whitespace is hit. You would have to make a few replacements like \par going to a newline, but it would be far more efficient than instantiating a new control. You can find the latest RTF specification here: http://www.microsoft.com/downloads/details.aspx?FamilyId=DD422B8D-FF06-4207-B476-6B5396A18A2B&displaylang=en There must also be a number of ready made RTF parsers available on the internet which you could use. | | Diggsey Tuesday, September 29, 2009 8:53 PM | That's certainly an option, although I'm absolutely horrible at regex.
I've tried and failed to find any libraries for .NET that allow me to convert RTF to plain text. I'll keep looking, though. | | RMD Tuesday, September 29, 2009 8:54 PM | Did you look hard enough? I quickly turned up this one.
Hans Passant. | | nobugz Tuesday, September 29, 2009 9:49 PM | I've actually tried to use that one before. Even with a simple RTF document it doesn't seem to work... just get lots of exceptions.
I've made some attempt to resolve the exceptions, but to be honest, the code is a classic example of over engineering, over refactoring, and over reliance on programming patterns. The thing has so many levels of indirection that it's nearly impossible to figure out exactly what's going on.
Got any others? :) | | RMD Tuesday, September 29, 2009 11:31 PM | I've actually tried to use that one before. Even with a simple RTF document it doesn't seem to work... just get lots of exceptions.
I've made some attempt to resolve the exceptions, but to be honest, the code is a classic example of over engineering, over refactoring, and over reliance on programming patterns. The thing has so many levels of indirection that it's nearly impossible to figure out exactly what's going on.
Got any others? :)
Hi, You may also want to try this one: NRtfTree(link to external site not controlled by Microsoft, use it at your own risk). That MSDN article introduces a way to extract the plain text from RTF using the RichEdit control, however, the control is not supposed to be used in multi-threads (and I don't think any of the WinForm controls are guaranteed to work like that without any problems). So a much reliable way to go is use some non-UI related libraries to have the job done. Especially we're talking about a service application here. Regards, Jie MSDN Subscriber Support in Forum If you have any feedback on our support, please contact msdnmg@microsoft.com
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
If you have any feedback, please tell us.
The CodeFx Project
My Blog (in Simplified Chinese) | | Wang, Jie Wednesday, September 30, 2009 9:56 AM | Jie,
Thanks. I've also tried NRTfTree. Again, it fails with even moderatly complicated RTF documents.
It seems like, for whatever reason, the open source options for RTF parsing aren't up to par. (No pun intended. :)
I'm going to have to try and continue to use the RichTextBox control. Perhaps by having a single RichTextBox per thread that is reused I can avoid this issue. | | RMD Wednesday, September 30, 2009 4:07 PM | Such a thread has to be initialized correctly. Call Thread.SetApartmentState() to switch it to STA before you start it. And call Application.Run() to pump a message loop. Control.Invoke() is required to set any of the properties. You can't use a threadpool thread.
Hans Passant. | | nobugz Wednesday, September 30, 2009 4:34 PM | Such a thread has to be initialized correctly. Call Thread.SetApartmentState() to switch it to STA before you start it. And call Application.Run() to pump a message loop. Control.Invoke() is required to set any of the properties. You can't use a threadpool thread.
Hans Passant.
I've implemented what amounts to a thread-specific object pool. Here is some relevant code:
public class RTFConverterPool
{
private readonly object m_RTFConverterPoolLock = new object();
private Dictionary<int, RTFConverter> m_RTFConverterPool;
public RTFConverter ThreadSafeRTFConverter
{
get
{
RTFConverter rtfConverter;
int currentThreadId = Thread.CurrentThread.ManagedThreadId;
lock(m_RTFConverterPoolLock)
{
if(m_RTFConverterPool.ContainsKey(currentThreadId))
{
rtfConverter = m_RTFConverterPool[currentThreadId];
}
else
{
rtfConverter = new RTFConverter(string.Empty, 800, 0, 0);
m_RTFConverterPool.Add(currentThreadId, rtfConverter);
}
}
return rtfConverter;
}
}<br/>
Each thread in my thread pool then calls the ThreadSafeConverter property to get its own thread-specific instance.
The code seems to work, although I haven't run enough testing to know if it resolves the original problem of having the service lock up. It's difficult for me to reproduce this on my desktop.
Why do I need to set the aparterment state to STA? It was my understanding that this was only required for COM interop. (Which, I guess, the RichTextBox could be using internally.) Also, what will Application.Run() do for me in this instance? Are you thinking that the RichTextBox needs to have a message loop running for this locking not to occur?
If only the thread that creates the RichTextBox ever accesses it, I'm assuming I don't need to use Invoke. | | RMD Wednesday, September 30, 2009 5:48 PM | (yes), yes and yes. Hans Passant. | | nobugz Wednesday, September 30, 2009 5:53 PM | (yes), yes and yes.
Hans Passant.
Ok, so if I modify my pool class to call Application.Run() when it creates a new RTFConverter instance, this will make sure that the thread in question is pumping messages correctly. Right? So the only thing that's left is to make sure that the RichTextBox can interop with COM appropriately, and to do that I'll need to make my threads STA. What other effects will that have on my application? I'm using the SmartThreadPool, so I'll have to figure out if it's even possible to run STA threads with it. | | RMD Wednesday, September 30, 2009 6:01 PM | I ended up using ThreadStatic variables to hold the RTFConvert (RichTextBox) instances for each thread. This seems to work, as I ran a large scale (72 hour) test and it never locked up. | | RMD Monday, October 05, 2009 4:56 PM |
|