Sunday, November 24, 2019
How to Use String Substitution in Ruby
How to Use String Substitution in Ruby Splitting a string is only one way to manipulate string data. You can also make substitutions to replace one part of a string with another string. For instance, in an example string (foo,bar,baz) replacing foo with boo in would yield boo,bar,baz. You can do this and many more things using the sub and gsub method in the string class. Many Options for Ruby Substitution The substitution methods come in two varieties. The sub method is the most basic of the two and comes with the least number of surprises. It simply replaces the first instance of the designated pattern with the replacement. Whereas sub only replaces the first instance, the gsub method replaces every instance of the pattern with the replacement. In addition, both sub and gsub have sub! and gsub! counterparts. Remember, methods in Ruby that end in an exclamation point alter the variable in place instead of returning a modified copy. Search and Replace The most basic usage of the substitution methods is to replace one static search string with one static replacement string. In the above example, foo was replaced with boo. This can be done for the first occurrence of foo in the string using the sub method or with all occurrences of foo using the gsub method. #!/usr/bin/env rubya foo,bar,bazb a.sub( foo, boo )puts b $ ./1.rbfoo,bar,bazgsub$ ./1.rbboo,bar,baz Flexible Searching Searching for static strings can only go so far. Eventually, youll run into cases where a subset of strings or strings with optional components will need to be matched. The substitution methods can, of course, match regular expressions instead of static strings. This allows them to be much more flexible and match virtually any text you can dream up. This example is a little more real world. Imagine a set of comma-separated values. These values are fed into a tabulation program over which you have no control (closed source). The program that generates these values is closed source as well, but its outputting some badly-formatted data. Some fields have spaces after the comma and this is causing the tabulator program to break. One possible solution is to write a Ruby program to act as glue, or a filter, between the two programs. This Ruby program will fix any problems in the data formatting so the tabulator can do its job. To do this, its quite simple: replace a comma followed by a number of spaces with just a comma. #!/usr/bin/env rubySTDIN.each do|l|l.gsub!( /, /, , )puts lend gsub$ cat data.txt10, 20, 3012.8, 10.4,11gsub$ cat data.txt | ./2.rb10,20,3012.8,10.4,11 Flexible Replacements Now imagine this situation. In addition to the minor formatting errors, the program that produces the data produces number data in scientific notation. The tabulator program doesnt understand this, so youre going to have to replace it. Obviously, a simple gsub wont do here because the replacement will be different every time the replacement is done. Luckily, the substitution methods can take a block for the substitution arguments. For each time the search string is found, the text that matched the search string (or regex) is passed to this block. The value yielded by the block is used as the substitution string. In this example, a floating point number in scientific notation form (such as 1.232e4) is converted to a normal number with a decimal point. The string is converted to a number with to_f, then the number is formatted using a format string. #!/usr/bin/env rubySTDIN.each do|l|l.gsub!( /-?\d\.\de-?\d/) do|n|%.3f % n.to_fendl.gsub!( /, /, , )puts lend gsub$ cat floatdata.txt2.215e-1, 54, 113.15668e6, 21, 7gsub$ cat floatdata.txt | ./3.rb0.222,54,113156680.000,21,7 Not Familiar With Regular Expressions? Lets take a step back and look at that regular expression. It looks cryptic and complicated, but its very simple. If youre not familiar with regular expressions, they can be quite cryptic. However, once you are familiar with them, theyre straightforward and natural methods of describing text. There are a number of elements, and several of the elements have quantifiers. The primary element here is the \d character class. This will match any digit, the characters 0 through 9. The quantifier is used with the digit character class to signify that one or more of these digits should be matched in a row. You have three groups of digits, two separated by a . and the other separated by the letter e (for exponent). The second element floating around is the minus character, which uses the ? quantifier. This means zero or one of these elements. So, in short, there may or may not be negative signs at the beginning of the number or exponent. The two other elements are the . (period) character and the e character. Combine all this, and you get a regular expression (or set of rules for matching text) that matches numbers in scientific form (such as 12.34e56).