Description
Text is everywhere. Web pages, databases, the contents of files–for almost any programming task you perform, you need to process text. Cut even the most complex text-based tasks down to size and learn how to master regular expressions, scrape information from Web pages, develop reusable utilities to process text in pipelines, and more.
Most information in the world is in text format, and programmers often find themselves needing to make sense of the data hiding within. It might be to convert it from one format to another, or to find out information about the text as a whole, or to extract information fromit. But how do you do this efficiently, avoiding labor-intensive, manual work?
Text Processing with Ruby takes a practical approach. You’ll learn how to get text into your Ruby programs from the file system and from user input. You’ll process delimited files such as CSVs, and write utilities that interact with other programs in text-processing pipelines. Decipher character encoding mysteries, and avoid the pain of jumbled characters and malformed output.
You’ll learn to use regular expressions to match, extract, and replace patterns in text. You’ll write a parser and learn how to process Web pages to pull out information from even the messiest of HTML.
Before long you’ll be able to tackle even the most enormous and entangled text with ease, scything through gigabytes of data and effortlessly extracting the bits that matter.
About the Author
Rob Miller is Operations Director at a London-based marketing consultancy. He spends his days merrily chewing through huge quantities of text in Ruby, turning raw data into meaningful analysis. He blogs at robm.me.uk and tweets @robmil.