reading notes for code fellows
Automation is a large piece of python development, because it allows you to let the computer run things on its own rather than you needed to do everything.
Regular expressions (regex) are a sequence of characters used to check if a pattern exists in a given string or not. They help to manipulate textual data, which is often important for data science projects involving text mining. To use regex, you need to import re, which is the library that holds it.
Basic patterns are simple, the regex string is just something along the lines of r'hello', which will check if the word hello exists anywhere in the text and has no other special meaning. The match() function will return a match object if the text matches the pattern, or else it will return None. Special characters don’t match themselves, but have a special meaning instead.
. - matches any character other than newline^ - matches the start of a string$ - matches the end of a string[] - matches the set of symbols within the bracers\ - if the character after the backslash is an escape character it stops being one\w - matches any letter digit or underscore\W - matches any character not covered by \w\s - matches any whitespace character\S - matches any non-whitespace character\d - matches any digit\D - matches any non-digit+ - matches one or more of the preceeding character* - matches zero or more of the preceeding character? - checks if the preceeding character appeares exactly zero or one time\t - matches tab\n - matches newline\r - matches return\A - matches only at the start of the string\Z - matches only the end of a string\b - matches only the beginning or end of the word{x} - preceeding character repeats exactly x number of times{x,} - preceeding character repeats x times or more{x,y} - preceeding character repeats at least x and no more than y timesIn regex, you can group parts of matching text using parenthesis. You also can use (?P<name>...) to create a named group. When a special character matches as much of the string as possible, it is called greedy. If you add a ? after a greedy qualifier, it makes it perform in a less greedy way.
The search() function allows you to scan through the given string for the first with a match, while the group() function returns the string matched by the expression. compile() will turn a regex pattern into a regular expression object, which saves it for reuse later on. findall() finds all possible matches within the sequence and returns them as a list of strings. finditer() does the same thing, but returns regex match objects as an iterator. sub() returns the string with the matched pieces replaced with something else. split() splits the string wherever the pattern matches and returns a list. start() gives the starting index of the match and end() returns the index where the match ends and span() returns a tuple containing the start and end positions of the match.
You can also add in compilation flags to modify an expressions behavior, for example, IGNORECASE (I) allows case insensitive matches, DOTALL (S) allows . to match any character including newline, MULTILINE (M) allows the start and end of string anchors to match newlines as well, and VERBOSE (X) allows you to write whitespace and comments in an expression to make it more readable.
The shutil module contains high-level file operations. copyfile() copies the contents of the source to the destination and raises an IOError if it doesn’t have the right permissions. Since it opens the file to copy it, certain types of files cannot be copied. The default behavior of copyfileobj() is to read using large blocks, which can be avoided by using -1 to read all of the input at once or a different positive integer to set the block size. The copy() function will create a new file in a directory if the destination name refers to a directory instead of a file. By default, the permissions of the file are copied along with the content, but if you also want the access and modification times from the metadata, you will need to use copy2(). If you want other metadata about the file, you should use copystat(). To copy a directory from one place to another, use copytree(), and to remove a directory and its contents use rmtree(). To move a file or directory from one place to another, use move(). If you’re looking to find a file, which() will scan a path looking for a searched name, and will return None if nothing is found.
This reminds me a bit of the web scraping from yesterday, which I think was also automation, as you allowed the computer to go over the page and search for the corret things then send them back.