reading-notes

reading notes for code fellows


Project maintained by dLeigh01 Hosted on GitHub Pages — Theme by mattgraham

Data Analysis

These are important subjects because I have to assume we will be starting to use Jupyter lab soon and NumPy is integral to many different Python packages that we will likely be coming across.

What is Jupyter Lab?

Jupyter lab is a web-based user interface for project Jupyter which enables users to worth with documents and activities like the Jupyter notebooks

It runs a text based editor in two different modes, command and edit, which allow you to create text and block of code in different cells to keep your ideas well formatted. The entirety of the document runs in python, so when you change things like variables at different places, those variables will be updated even in other cells. You also can run this code out of the order it is written and the editor will number them in the order they were run so you can keep track of what you’ve done. The text is also easily formatted through markdown and the lab supports many different files types, including having the ability to rotate and zoom pictures within the program.

NumPy Tutorial

NumPy is a Python data analysis package that can speed up your workflow and interface with other Python packages that use it. Since it has been in use for around 20 years, the majority of data analysis and machine learning packages in Python use it in some way. The CSV package can take a ssv (semicolon separated values) set of data and separate them out into a list of lists, with each inner list being a row of data from the original ssv file and every value being represented as a string. NumPy works with these 2 dimensional arrays, otherwise known as a matrix. Within a NumPy array, the number of dimensions is called the rank and each dimension is called an axis. You can create a NumPy array by passing in your multi-demnsional array to numpy.array(); within a NumPy array, all elements must be the same type, so if you want an array of floats and integers, leave out any possible rows of strings during this step. You also can generate a NumPy array directly from a file using numpy.genfromtxt("filename", delimiter="parsing point(often ;)", skip_header=1). You access values in a NumPy array in the same way as any other multi-dimensional array, by specifying the index of each axis, with the possible indexes starting at 0. You can also slice values in a NumPy array using array[0:3, 3] with the values starting at the index before the colon and going up to but not including the index after. If you just place a colon instead of a number the program will retrieve all the values from that row or column. If you need to change the datatype of a NumPy array, you can use the numpy.ndarray.astype method.

If you want to do math with an entire axis, you can with something like array[:,5] += 5. You also can multiply between arrays in a similar way. If you attempt to perform operations between arrays that are not equal sizes and neither’s size is one, then you will recieve an error. There are many built in mmethods for math within NumPy arrays as well, such as .sum(), .std() - standard deviation, .min(), and .max(). If you use comparison operators, you will get a Boolean array of which values meet or miss the standard for the operation. If you use that Boolean array as part of your index selection, then you will only get back the values that meet that condition.

Things I Want to Know More About

Discussion

Jupyter Lab is a method in the web browser to use Jupyter Notebooks, which should be a good way to taken python notes, similarly to how VSCode has been useful for note taking, only more powerful because it can actually run the code snippets being inserted since it isn’t limited to being a markdown file. NumPy is a major aspect of data analysis that will help to format and work with data we will come across.

[< table of contents]

[< home]