I've got two Drupal modules that depend on reading through text to do something useful - the first is the 'Drupal Markup Engine', or dme, and the second is currently called 'pingv_coder', an extension to coder to check for what I think are poor programming practices, which isn't done yet.
In both of them I'm currently using regexp to scan and look for what I need - in the first I'm using some truly twisted regular expressions to find and isolate the tags that are left in the text. DME allows programmers to define custom tags in the format of <dme:tagname/> or <dme:tagname>...</dme:tagname>, and the code looks for these tags with regex. At the time I started, I read a lot of sites saying that parsing html-like or xml-like tags with regex wasn't the right way to go, and as I've improved and used my code, it seems more and more that that's true. It works well for the simple cases, but when you have tags nested inside of other tags, you start running into problems making sure you've got the right things nested. Happily, for the production sites we've used it in, the tags have been very simple, on the order of 'put this image here' and so on, and so there haven't been any problems.