reading-notes

reading notes for code fellows


Project maintained by dLeigh01 Hosted on GitHub Pages — Theme by mattgraham

Automation

Automation is a large piece of python development, because it allows you to let the computer run things on its own rather than you needed to do everything.

Python Regular Expressions Tutorial

Regular expressions (regex) are a sequence of characters used to check if a pattern exists in a given string or not. They help to manipulate textual data, which is often important for data science projects involving text mining. To use regex, you need to import re, which is the library that holds it.

Basic patterns are simple, the regex string is just something along the lines of r'hello', which will check if the word hello exists anywhere in the text and has no other special meaning. The match() function will return a match object if the text matches the pattern, or else it will return None. Special characters don’t match themselves, but have a special meaning instead.

In regex, you can group parts of matching text using parenthesis. You also can use (?P<name>...) to create a named group. When a special character matches as much of the string as possible, it is called greedy. If you add a ? after a greedy qualifier, it makes it perform in a less greedy way.

The search() function allows you to scan through the given string for the first with a match, while the group() function returns the string matched by the expression. compile() will turn a regex pattern into a regular expression object, which saves it for reuse later on. findall() finds all possible matches within the sequence and returns them as a list of strings. finditer() does the same thing, but returns regex match objects as an iterator. sub() returns the string with the matched pieces replaced with something else. split() splits the string wherever the pattern matches and returns a list. start() gives the starting index of the match and end() returns the index where the match ends and span() returns a tuple containing the start and end positions of the match.

You can also add in compilation flags to modify an expressions behavior, for example, IGNORECASE (I) allows case insensitive matches, DOTALL (S) allows . to match any character including newline, MULTILINE (M) allows the start and end of string anchors to match newlines as well, and VERBOSE (X) allows you to write whitespace and comments in an expression to make it more readable.

Shutil

The shutil module contains high-level file operations. copyfile() copies the contents of the source to the destination and raises an IOError if it doesn’t have the right permissions. Since it opens the file to copy it, certain types of files cannot be copied. The default behavior of copyfileobj() is to read using large blocks, which can be avoided by using -1 to read all of the input at once or a different positive integer to set the block size. The copy() function will create a new file in a directory if the destination name refers to a directory instead of a file. By default, the permissions of the file are copied along with the content, but if you also want the access and modification times from the metadata, you will need to use copy2(). If you want other metadata about the file, you should use copystat(). To copy a directory from one place to another, use copytree(), and to remove a directory and its contents use rmtree(). To move a file or directory from one place to another, use move(). If you’re looking to find a file, which() will scan a path looking for a searched name, and will return None if nothing is found.

Things I Want to Know More About

Discussion

This reminds me a bit of the web scraping from yesterday, which I think was also automation, as you allowed the computer to go over the page and search for the corret things then send them back.

[< table of contents]

[< home]