Microsoft Beefs Up VBScript with Regular Expressions
[by n3015m]
Option Explicit
dim bl_num, bl_ip, bl_url, bl_filename, objFSO, objFile, strLine, objArgs, regExIP, regEXURL
Const ForReading = 1
dim chk_ip, chk_url
Set objArgs = WScript.Arguments
Set regExIP = New regExp
Set regExURL = New regExp
regExIP.Pattern = "([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})(.*)?"
regEXURL.pattern= "\??(hxxp:\/\/|hxxps:\/\/|http:\/\/|https:\/\/)?([a-zA-Z0-9\-\.\:]+)(\/.*)?"
if objArgs.Count = 0 then
WScript.Echo "A file-read error has occurred, Filename argument required." & vbCrlf & "- usage : " & Wscript.ScriptName & " [arguments]"
WScript.Quit
else
bl_filename = objArgs(0)
end if
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(bl_filename, ForReading)
Do Until objFile.AtEndOfStream
strLine = objFile.ReadLine
bl_num = trim(split(strline,",")(0))
bl_ip = trim(split(strline,",")(4))
bl_url = trim(split(strline,",")(5))
if bl_url <> "" then
chk_ip = trim(regExIP.replace(bl_ip,"$1"))
chk_url = trim(regExURL.replace(bl_url,"$2"))
'Wscript.Echo bl_num & vbTab & bl_ip & vbTab & chk_ip & vbTab & bl_url
Wscript.Echo bl_num & vbTab & chk_url
end if
Loop
objFile.Close
VBScript RegExp Object
Properties | Methods |
---|---|
Pattern | Test (search-string) |
IgnoreCase | Replace (search-string, replace-string) |
Global | Execute (search-string) |
Position Matching
mbol | Function |
---|---|
^ | Only match the beginning of a string. "^A" matches first "A" in "An A+ for Anita." |
$ | Only match the ending of a string. "t$" matches the last "t" in "A cat in the hat" |
\b | Matches any word boundary "ly\b" matches "ly" in "possibly tomorrow." |
\B | Matches any non-word boundary |
Literals
Symbol | Function |
---|---|
Alphanumeric | Matches alphabetical and numerical characters literally. |
\n | Matches a new line |
\f | Matches a form feed |
\r | Matches carriage return |
\t | Matches horizontal tab |
\v | Matches vertical tab |
\? | Matches ? |
\* | Matches * |
\+ | Matches + |
\. | Matches . |
\| | Matches | |
\{ | Matches { |
\} | Matches } |
\\ | Matches \ |
\[ | Matches [ |
\] | Matches ] |
\( | Matches ( |
\) | Matches ) |
\xxx | Matches the ASCII character expressed by the octal number xxx. "\50" matches "(" or chr (40). |
\xdd | Matches the ASCII character expressed by the hex number dd. "\x28" matches "(" or chr (40). |
\uxxxx | Matches the ASCII character expressed by the UNICODE xxxx. "\u00A3" matches "£". |
Character Classes
Symbol | Function |
---|---|
[xyz] | Match any one character enclosed in the character set. "[a-e]" matches "b" in "basketball". |
[^xyz] | Match any one character not enclosed in the character set. "[^a-e]" matches "s" in "basketball". |
. | Match any character except \n. |
\w | Match any word character. Equivalent to [a-zA-Z_0-9]. |
\W | Match any non-word character. Equivalent to [^a-zA-Z_0-9]. |
\d | Match any digit. Equivalent to [0-9]. |
\D | Match any non-digit. Equivalent to [^0-9]. |
\s | Match any space character. Equivalent to [ \t\r\n\v\f]. |
\S | Match any non-space character. Equivalent to [^ \t\r\n\v\f]. |
Repetition
Symbol | Function |
---|---|
{x} | Match exactly x occurrences of a regular expression. "\d{5}" matches 5 digits. |
{x,} | Match x or more occurrences of a regular expression. "\s{2,}" matches at least 2 space characters. |
{x,y} | Matches x to y number of occurrences of a regular expression. "\d{2,3}" matches at least 2 but no more than 3 digits. |
? | Match zero or one occurrences. Equivalent to {0,1}. "a\s?b" matches "ab" or "a b". |
* | Match zero or more occurrences. Equivalent to {0,}. |
+ | Match one or more occurrences. Equivalent to {1,}. |
Alternation & Grouping
Symbol | Function |
---|---|
() | Grouping a clause to create a clause. May be nested. "(ab)?(c)" matches "abc" or "c". |
| | Alternation combines clauses into one regular expression and then matches any of the individual clauses. "(ab)|(cd)|(ef)" matches "ab" or "cd" or "ef". |
BackReferences
Symbol | Function |
---|---|
()\n | Matches a clause as numbered by the left parenthesis "(\w+)\s+\1" matches any word that occurs twice in a row, such as "hubba hubba." |