DATA 310: Applied Machine Learning

Course ID: DATA 310
Course Attribute: None
Title: Applied Machine Learning
Credit Hours: 3
Meeting Times: 10:10 to 12:00 MTWTh
Location: Remote Synchronous Off-Campus
Date Range: Summer II Semester 2021

Course Description

This course will focus on the technical application of machine learning algorithms, their nature, and discussions regarding the potential drawbacks and advantages of different classes of algorithms. Students entering into this course should have, at a minimum, a background in python and linear algebra. No single algorithm will be covered in great depth, and the course will place a focus on the code and implementation choices necessary for each class of algorithm. Topics covered will include data processing, regression in ML, decision trees, forests, k-nn, support vector machines, kernel SVM, k-means and hierarchical clustering, association rules, natural language processing, neural networks, and various associated approaches. Pre-requisite(s): (DATA 141 OR DATA 140 OR CSCI 140 OR CSCI 141) AND (DATA 146 OR CSCI 146)

Goals and Objectives:

To provide students with a critical understanding of the variety of tools that can be used for machine learning.
To develop students ability to communicate findings, analysis, and visualization skills for future courses (and jobs).
To expose students to real-world problems that are being engaged with by contemporary problem solvers and decision makers.

Honor Code

Among our most significant traditions is the student-administered honor system. The Honor Code is an enduring tradition with a documented history that originates as far back as 1736. The essence of our honor system is individual responsibility. Today, students, such as yourself, administer the Honor pledge to each incoming student while also serving to educate faculty and administration on the relevance of the Code and its application to students’ lives.

The Pledge

“As a member of the William and Mary community, I pledge on my honor not to lie, cheat, or steal, either in my academic or personal life. I understand that such acts violate the Honor Code and undermine the community of trust, of which we are all stewards.”

Accessibility, Attendance & Universal Learning

William & Mary accommodates students with disabilities in accordance with federal laws and university policy. Any student who feels s/he may need accommodation based on the impact of a learning, psychiatric, physical, or chronic health diagnosis should contact Student Accessibility Services staff at 757-221-2509 or at sas@wm.edu to determine if accommodations are warranted and to obtain an official letter of accommodation. For more information, please see www.wm.edu/sas.

I am committed to the principle of universal learning. This means that our classroom, our virtual spaces, our practices, and our interactions be as inclusive as possible. Mutual respect, civility, and the ability to listen and observe others carefully are crucial to universal learning. Active, thoughtful, and respectful participation in all aspects of the course will make our time together as productive and engaging as possible.

Grade Categories


exceptional A = 100 ≥ 97.0	excellent A = 96.9 ≥ 93.0	superior A- = 92.9 ≥ 90.0
very good B+ = 89.9 ≥ 87.0	good B = 86.9 ≥ 83.0	above average B- = 82.9 ≥ 80.0
normal C+ = 79.9 ≥ 77.0	average C = 76.9 ≥ 73.0	sub par C- = 72.9 ≥ 70.0
below average D+ = 69.9 ≥ 67.0	poor D = 66.9 ≥ 63.0	very poor D- = 62.9 ≥ 60.0
failing F < 60.0

note .9 = .9 with bar notation

Grading Opportunities


Exercises	40%	4 modules
Projects	40%	4 projects
Final Project	20%	due by 5PM on the last day of finals

Semester Schedule

Week 1 (July 5)

Tuesday:
- 10:10 to 11:00 - Getting Started
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Wednesday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 - Close early
Thursday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Saturday:
- Informal Responses due by 5PM
Sunday:
- Project 1 due by 5PM

Week 2 (July 12)

Monday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Tuesday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Wednesday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Thursday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Saturday:
- Informal Responses due by 5PM
Sunday:
- Project 2 due by 5PM

Week 3 (July 19)

Monday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Tuesday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Wednesday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Thursday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Saturday:
- Informal Responses due by 5PM
Sunday:
- Project 3 due by 5PM

Week 4 (July 26)

Monday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Tuesday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Wednesday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Thursday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Saturday:
- Informal Responses due by 5PM
Sunday:
- Project 4 due by 5PM

Week 5 (August 2)

Monday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Tuesday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Wednesday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Thursday:
- 10:10 to 11:00 - Exercise
- 11:00 to 11:30 - Small groups
- 11:30 to 12:00 - Reconvene
Saturday:
- Informal Responses due by 5PM
Sunday:
- Final Project due by 5PM

Module 1

Tuesday

Download and install python 3.9.6
The simplest way to install the latest version of python is by going to their website, python.org and downloading the most recent version. After arriving at the main python.org landing page you should find a tab linking to the download for Python 3.9.6. Once there, download and install the latest version of python for your Mac or Windows system.
Download and install PyCharm Professional Version
In this class we will use the integrated developer's environment (IDE), which goes by the name of PyCharm. Go to the jetbrains.org website and download the Professional version for your Mac or Windows system. After installing PyCharm, launch it from your applications folder. You can use the evaluate for free button for now, until you obtain your JetBrains educational license.
Set the python interpreter
After launching PyCharm, select new project, and give your project a name (perhaps DATA310), and for the python interpreter, select previously configured interpreter and choose the ... tab to the right of the interpreter drop down menu. This tab will bring you to a new window, where you will want to select the system interpreter option on the left option bar. Hopefully the option for python3.9 already appears in the interpreter drop down menu. If not, you will likely need to select the ... tab to the right again, and manually select the python3.9 interpreter from within your file system. On a Mac the path to the version of python you installed in step 1 is likely /usr/local/bin/python3.9. Once your python 3.9.6 interpreter has been selected, then click on the OK button in the bottom right hand corner.

new project window selecting the system interpreter creating a new python script

Confirm your python session After creating a new script, click on the python console at the bottom of the screen and confirm your environment is using python 3.9.6. In your new script, add a line where you send a command to the interpreter.

2 + 2

The easiest way to execute the line of code is to make sure the cursor os on that line and then use the alt-shift-E combination in order to send the command to the interpreter for a result. Alternatively, you can option click (right click or two finger click) on the line and select the execute line in python console option.

If python returns the correct answer, then it appears you have properly configured your environment.

Install a new package Your PyCharm IDE includes a package manager that makes package installation and updating fairly simple. To install a new package click on the preferences tab under your PyCharm pull down menu and then select the Project tab. Under the project tab you should see a subtab called Python Interpreter and off to the right, two packages that have already been installed. First, make sure both of these packages (should be pip and setuptools) have the most up to date version installed, and if not, use the up arrow symbol to upgrade each package to its latest version. Next select the + button to install a new package. Type tensorflow in the search window and then select install package. It might take a minute or two for the tensorflow package to complete its installation process. You can monitor your IDE's progress at the bottom while waiting for tensorflow to be installed successfully. After returning to main window of your package manager, you should notice a number of additional dependent packages that were installed in support of tensorflow. Close the system preferences window and add a new line to your script import tensorflow. Again, your IDE may take a minute or two in order to execute the command for the first time, but once complete, you should see the >>> indication that your interpreter is ready to execute a command. If you click on the progress bar, it is possible to review the background tasks that involve updating the interpreter.

importing libraries

After the system finishes add a few lines to your script and confirm the output.

# TensorFlow and tf.keras
import tensorflow as tf

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt

print(tf.__version__)

importing libraries

Next go to the JetBrains webpage and register for an educational license. After receiving your license, return to the Welcome to PyCharm page and click on the gear symbol in the bottom left hand corner, where you will find the option to manage licenses. Enter your license to register your Professional version of PyCharm.

importing libraries

Next install the JetBrains IDE WebStorm for interacting with your GitHub account. Webstorm is a good IDE for producing markdown files for publishing in your public facing GitHub pages site.
Next, if you don't already have one, create a GitHub account. Once you have a GitHub account, create a new repository. Perhaps call it data310. Each repository needs to have a README.md file in its root directory. Be sure to create one, which you can use as the index for your repository of formal and informal assignments.
Go to the settings tab in your repository and click on the pages tab in the left hand pane. Set the branch to main, save your GitHub pages and then note where your site has been published. Click on the link and reload it, until it appears. Sometimes this can take a minute or two.

importing libraries

In the main page of your repository open the code tab and copy the HTTPS address that you will use to link your repository to WebStorm via version control. Go back to the Welcome to Webstorm page and select the Get from VCS tab. Paste your HTTPS address into the URL field in order to clone your GitHub pages repository.

importing libraries

Once you have cloned the repository from your GitHub account, you have a local project that can be modified and then those changes can be pushed to the GitHub repo for publishing via GitHub Pages (which happens automatically through something called the Jekyll engine). Go ahead and modify your README.md file by setting it up as the index for your informal and formal assignments. Perhaps your index looks something like the following. Note how I created the wwek1.md file and linked it in the index.

importing libraries

Markdown is fairly simple to learn. Here is a guide.

Now push those changes to your repository by selecting the commit tab on the left hand side, highlighting all the falls you have changed to commit, add a commit message , and then select the commit and push tab. You may need to add your user name, e-mail and authorize your account. Now go back to your GitHub pages webaddress and notice that the changes have taken place.

importing libraries

Finally, go back to PyCharm and create a new python script. Go to the tensorflow webpage at https://www.tensorflow.org/tutorials/keras/classification, read and reproduce the script in PyCharm and run it. Answer the following questions about the dataset.
- What is the shape of the images training set (how many and the dimension of each)?
- What is the length of the labels training set?
- What is the shape of the images test set?

Be prepared to discuss during tomorrow's class.

Wednesday

Start with the Basic classification: Classify images of clothing script that you prepared for today's class. Execute the following.

From the Preprocess the data section of the script, modify the training image to produce three new images.
Under the Make predictions section, present the array of predictions for an image from the test set other than the one given in the example script. What does this array represent? How were the Softmax() and argmax() functions applied? Read this post from DeepAI, What is the Softmax function for a description of the two functions (focus on the softmax formula, calculating softmax and the softmax vs. argmax sections). Does the output from np.argmax() match the label from your test_labels dataset?
Under the Verify predictions section, plot two additional images (other than either of the two given in the example script) and include the graph of their predicted label as well as the image itself.
Under the Use the trained model section, again select a new image from the test dataset. Produce the predictions for this newly selected image. Does the predicted value match the test label? Although you applied the argmax() function in this second instance, you did not use Softmax() a second time. Why is that so (please be specific)?

Copy the script you produced for the Fashion MNIST dataset. Modify the script as follows and produce the following output.

Change the line

fashion_mnist = tf.keras.datasets.fashion_mnist

mnist = tf.keras.datasets.mnist

in order to switch the dataset from sets of clothing to hand written numbers. Produce a plot of 25 handwritten numbers from the data with their labels indicated below each image. Fit the model and report the accuracy of the training dataset. Likewise report the accuracy of the test dataset. As in the above example, from the Verify predictions section, plot two images and include the graph of their predicted label as well as the image itself.

Evaluate how your model for the MNIST dataset compared with your model of the Fashion_MNIST dataset. Which of the two models is more accurate? Why do you think this is so?

For Thursday's class, reproduce the script on Basic Text Classification. Please read through the script and confirm you are able to execute a basic inferential sentiment analysis.

Thursday

At the bottom of the Basic Text Classification exercise you prepared for today's class is a second similarly structured exercise. With two of your classmates, complete the multi-class classification on Stack Overflow questions exercise.

Provide plots that describe the training and validation loss and accuracy. Interpret the plots. How is this model performing?
Export the model and print the accuracy produced using the test data. Does the test data accuracy serve to validate your model?
Produce three examples that are relevant to the multi-class classification model and predicting the language relevant to the stack overflow question. Produce the array generated by each of your three examples. Were these predictions accurate?
Compare your multi-class model with your binary class model that predicted sentiment as either positive or negative in response to various movie reviews. What is the most significant difference between the two different types of models? Compare the two models and estimate which demonstrated better performance? Substantiate your response.

Project 1

Collect data from zillow to train a model that predicts home price.

For reference watch What is ML by Laurence Maroney, beginning at about minute 8 with the subsequent housing price exercise presented at the end of the video. You are also welcome to refer to the beginning part of the second video in this series First steps in computer vision, where the solution to the example housing price exercise is presented.
Got to the Apify zillow scraper and scrape a few hundred home sale listings from any city. Download the file as a .csv.

importing libraries

Use the pandas read_csv() function to import your file to your PyCharm session. Create a data frame that has the following variables.
- price
- bedrooms
- bathrooms
- livingArea
Check if your data frame has any NaN values and drop them in place as needed.
As in the above referenced example, scale the living area and price values. Check the shape of your input data in preparation for specifying your model.
Pipe the data to your model as follows.

model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[3])])
model.compile(optimizer='sgd', loss='mean_squared_error')
history = model.fit(X, y, epochs=100)

Use model.predict() to predict the price of each home using the input variables
Inverse transform these values (the opposite of where you scaled the price above) and add them as a new column in your data frame.
Use the following to plot model loss.

plt.plot(history.history['loss'], label='loss')
plt.ylim([add, add])
plt.xlim([add, add])
plt.xlabel('Epoch')
plt.ylabel('loss')
plt.legend()
plt.grid(True)
plt.show()

Use the following to plot the asking price and predicted price for all of the homes in your selected city

plt.scatter(your_dataframe['add_variable'], your_dataframe['add_variable'])
plt.xlim([add, add])
plt.ylim([add, add])
plt.axline([0, 0], [1, 1])
plt.xlabel('add_label')
plt.ylabel('add_label')
plt.legend()
plt.show()

Produce the MSE for your model using the following.

mse = np.mean((y - y_pred) ** 2)
mse

Answer the following questions.
- How did your model fare?
- In your estimation is there a particular variable that may improve model performance?
- Which of the predictions were the most accurate? In which percentile do these most accurate predictions reside? Did your model trend towards over or under predicting home values?
- Which feature appears to be the most significant predictor?

Module 2

Monday

Start with the Regression script that we prepared during today's class. Replace the Auto MPG dataset provided in the tensorflow exercise with the Auto Imports dataset provided in the UCI Machine Learning Repository. Specify a model with the following target and features.

highway-mpg (continuous)
num-of-cylinders (categorical)
engine-size (continuous)
horsepower (continuous)
curb-weight (continuous)

Specify and train both a multi-class linear regression and a multi-class DNN regression. Which of the two models produces a better loss metric (see this link for an explanation of the loss function). Produce a plot that supports your answer. Return to the remainder of the variables from the dataset and add additional continuous and categorical features with the intent of improving your loss metric. Produce a plot that demonstrates the value of your model. What is the best model your team was able to produce?

For tomorrow, read and produce the script on Overfit and underfit in preparation for class.

Tuesday

Using the script you produced with the Higgs Dataset answer the following questions.

Describe the dataset. What type of variable is the target? How many features are being used? How many observations are in the training dataset? How many are used in the validation set?
How did each of the four models perform (tiny, small, medium and large)? Which of the four models performed the best? Which ones performed the worst? Why in your estimation did certain models perform better? Produce a plot that illustrates and compares all four models.
Apply regularization, then add a drop out layer and finally combine both regularization with a dropout layer. Produce a plot that illustrates and compares all four models. Why in your estimation did certain models perform better?
What is an overfit model? Why is it important to address it? What are four different ways we have addressed an overfit model thus far?

For next class, we will be shifting gears back to image processing and then on to computer vision with convolutional neural networks. Please complete the script Load and preprocess images and then also watch the Convolutions and pooling video by Laurence Maroney of Google Developers. After watching the video answer the following questions.

In this exercise you manually applied a 3x3 array as a filter to an image of two people ascending an outdoor staircase. Modify the existing filter and if needed the associated weight in order to apply your new filters to the image 3 times. Plot each result, upload them to your response, and describe how each filter transformed the existing image as it convolved through the original array and reduced the object size. What are you functionally accomplishing as you apply the filter to your original array? Why is the application of a convolving filter to an image useful for computer vision? Stretch goal: instead of using the misc.ascent() image from scipy, can you apply three filters and weights to your own selected image? Describe your results.
Another useful method is pooling. Apply a 2x2 filter to one of your convolved images, and plot the result. In effect what have you accomplished by applying this filter? Does there seem to be a logic (i.e. maximizing, averaging or minimizing values?) associated with the pooling filter provided in the example exercise (convolutions & pooling)? Did the resulting image increase in size or decrease? Why would this method be useful? Stretch goal: again, instead of using misc.ascent(), apply the pooling filter to one of your transformed images.

Finally, review the following images from class and then convolve the 3x3 filter over the 9x9 matrix and provide the resulting matrix. link to matrices

new project window source

new project window

source

new project window

source

Thursday

For today's class you loaded and preprocessed the tf.dataset the tf_flowers, and then trained a CNN in order to predict whether a particular image was of one of five different types of flowers. For the first part of this exercise, instead of using the tf.keras.utils.get_file() to load your data, use tfds.load() to load a tensorflow dataset as illustrated in the section Using TensorFlow Datasets. Using the relevant parts from the entirety of the example script as a guide, fit a CNN model to your training data and validate using the beans dataset from Tensorflow datasets and then again train and validate using the eurosat dataset. Present your results and discuss the accuracy of each of model.

Additionally, go to the image augmentation exercise and read through and become familiar with the many individual examples presented. Towards the end of this exercise is the Apply augmentation to a dataset example, that illustrates a resize and rescale image augmentation implementation to the tf_flowers dataset. Apply this same method to both the beans and eurosat datasets. Did your model performance improve? How many epochs were you able to run and how much time did it take? Present your results and discuss the accuracy of your augmented output for tomorrow's class.

Module 3

Monday

Start with the Load CSV data script that we prepared during today's class. Following the Mixed data types section from the exercise, replace the titanic dataset first with the Metro Interstate Traffic Volume Data Set and then with the Iris Dataset. You can use the read_csv() command from the pandas package to pull these datasets from the following web addresses.

https://archive.ics.uci.edu/ml/machine-learning-databases/00492/Metro_Interstate_Traffic_Volume.csv.gz
https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv

Use the plot_model() command from tf.keras.utils to produce the plot that describes the input preprocessing step. Describe the plot of each model for the two dataset preprocessing steps. What does each box in the illustration represent? Are there different paths towards the final concatenation step? What is occurring at each step and why is it necessary to execute before fitting your model. Train each model and produce the output (not necessary to validate or test). Describe the model output from both the metro traffic interstate dataset and the iris flowers dataset. What is the target for each dataset? How would you assess the accuracy of each model? Are you using a different metric for each one? Why is this so? What is each one measuring?

For tomorrow, read and produce the script on Custom training: walkthrough in preparation for class.

Tuesday

Using the Custom training: walkthrough script we developed during today's class, write a 1-1/2 to 2-page report that provides a description of the following.

the data used in the model
how you created the tf.dataset
how you specified your model architecture, including the input shape, the layers and other functions
how you trained the model
how you optimized the model
how you estimated loss including a plot that describes that change per epoch
how you evaluated the model with your test dataset
how you made some predictions, including three new ones of your own and their results

For tomorrow, read and produce the script on Classify structured data with feature columns in preparation for class.

Wednesday

Using the Classify structured data with feature columns you prepared for today's class as a model example, train, validate and test a model that has wealth class as the target as follows.

import the dataset city_persons.csv to your PyCharm environment
Initially set the target to the least wealthy class, 2 in this case, and set all other wealth class outcomes to 0 (3,4 & 5)
Train, validate and test your model
Interpret and analyze your results. Did the model performance exhibit a particular trend?

For tomorrow, read and produce the script on Classify structured data using Keras Preprocessing Layers in preparation for class.

Thursday

Using the Classify structured data using Keras Preprocessing Layers you prepared for today's class as a model example, train, validate and test a model that has wealth class as the target as follows.

Import the dataset country_persons.csv to your PyCharm environment
Initially set the target to the least wealthy class, 1 in this case, and set all other wealth class outcomes to 0 (2,3,4 & 5)
Train, validate and test your model
Interpret and analyze your results. Did the model performance exhibit a particular trend?

Project 3

Using either the Classify structured data with feature columns script or the Classify structured data using Keras Preprocessing Layers with the country_persons.csv dataset, specify, train and evaluate models that predict class of wealth versus all other classes as a binary target. Amongst the five possible binary targets (1 versus all others, 2 versus all other etc...), produce your results from the two models with the best and worst accuracy.
Using the data and two models you produced in step 1, create a confusion matrix. You are welcome to use the following script. With your confusion matrix as a reference, analyze and discuss the two sets of results you produced.
Again using either the Classify structured data with feature columns script or the Classify structured data using Keras Preprocessing Layers with the country_persons.csv dataset, specify, train and evaluate a model that predicts all class of wealth outcomes as a categorical target. Again create a confusion matrix in order to analyze and discuss your results. Revisit the Classify structured data with feature columns script or the Classify structured data using Keras Preprocessing Layers in order to modify your feature columns, in attempt to improve the accuracy of your model that uses all five categorical wealth classes as the target. Analyze and discuss your progress and results.

Step 1. Define your confusion matrix plot command

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
  """
  This function prints and plots the confusion matrix.
  Normalization can be applied by setting `normalize=True`.
  """
  import itertools
  if normalize:
    cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
    print("Normalized confusion matrix")
  else:
    print('Confusion matrix, without normalization')

  print(cm)

  plt.imshow(cm, interpolation='nearest', cmap=cmap)
  plt.title(title)
  plt.colorbar()
  tick_marks = np.arange(len(classes))
  plt.xticks(tick_marks, classes, rotation=45)
  plt.yticks(tick_marks, classes)

  fmt = '.2f' if normalize else 'd'
  thresh = cm.max() / 2.
  for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
    plt.text(j, i, format(cm[i, j], fmt),
             horizontalalignment="center",
             color="white" if cm[i, j] > thresh else "black")

  plt.ylabel('True label')
  plt.xlabel('Predicted label')
  plt.tight_layout()

Step 2. Create objects that describe the true and predicted target values

y_true = assign the true target values to this object
y_pred = assign the predicted target values to this object

Step 3. Plot your confusion matrix

cnf_matrix = confusion_matrix(y_true, y_pred,labels=[add labels here])
np.set_printoptions(precision=2)

plt.figure()
plot_confusion_matrix(cnf_matrix, classes=[add labels here],
title='title your plot')
plt.show()

Module 4

Wednesday

Using the Neural Style Transfer script we developed during today's class replace the content and style images with two images of your own selection. Synthesize the two images by applying the following elements from the exercise.

Define content and style representations
Extracting style and content
Implementing the style transfer algorithm
Apply regularization term on the high frequency components

Describe your implementation of neural style transfer to your content and style images. How did the application of the four steps from above produce your resulting work of art. Comment on what your work means in terms of your development as a data scientist at William & Mary with forethought to the future (please feel free to take as much artistic license as you wish when answering this last question).

Stretch goal: Using the the Deep Dream script, produce another version of your artwork.

For tomorrow, we will work on producing the script from the tensorflow exercise Text generation with an RNN during class in continued preparation for Project 4.

painting lion new project window

Thursday

Using the Text generation with an RNN script we developed during today's class, use the shakespeare.txt file to generate your own text that imitates the script from Shakespeare. Produce your generated text and describe how it was produced, addressing the following aspects.

Processing, vectorizing and predicting text
Building and training the model
Generating text

Stretch goal: Add the Advanced:Customized Training procedure at the end of the exercise by specifying a starting point and implementing curriculum learning.

Project 4

Using the Text generation with an RNN from Thursday, replace the shakespeare.txt with a screen play script from your favorite movie. Use the RNN model to generate new text from the source document and then use that machine generated output as the basis for a 5 to 8 minute production on Monday. Be sure to assign a role for each team member such that everyone has a chance to perform. Use the text generated from your RNN model as the basis for your theatrical production, but also feel free to modify it as you please in order to make your presentation more entering. By Monday evening, answer the following questions as a group and post to your GitHub index.

How did you modify the data and python script in order to generate comprehensible text?
Did your output seem to generate text that was relying more on the entire corpus of your screen play (i.e. more associated with long term memory) or was it also able to incorporate some more immediate sequential associations (i.e. more associated with short term memory)?
Did you make any changes to the model itself?
Were there any unexpected or surprising phrases or statements that were generated by your RNN?

As a stretch goal, for your Monday morning presentation, produce one or two images (using the neural style transfer and/or deep dream methods) that you can use as a virtual background to supplement your RNN generated and inspired theatrical production.

Present your Results

Summary

For the final assignment of this course, record an eight to twelve minute presentation of the applied machine learning model you will use as a prototype for your proposed future research plan. Your recorded presentation will be due by 5PM on August 8th.

Step 1. Write an abstract and give your presentation a title

Draft an abstract and give your presentation a title. Review and edit your abstract, which should be less than a page in length, single spaced. Post your abstract to your index in advance of the deadline.

Step 2. Prepare the slides for your presentation

Prepare to speak eight to twelve minutes and include at least one presentation slide that addresses each of the following elements from your work.

A problem statement that introduces your selected topic, identifies significant goals associated with the implementation of your applied machine learning method, demonstrates why your problem is important, and describes and analyzes the complex nature of your problem including any process oriented causes and effects. Conclude your problem statement with a stated central research question. You are welcome to articulate a central research question in broad and general terms, given the abbreviated time frame for this investigation.
A description of the data that you are using as input for your applied machine learning methodology, including the source of the data, the different features (variables) and well as their data class (i.e. continuous or discrete). Be sure to include a description of your dataset size (number of rows / observations as well as number of columns / variables / features) and provide context on how the data was collected as well as the source.
Provide the specification for your applied machine learning model that presented the most promise in providing a solution to your problem. Include the section from your script that specifies your model architecture. Provide a brief description of how you implemented your code in practice.
Conclude with a section that preliminarily assesses model performance. If you have results from your implementation, you are welcome to add those in this section. Compare your preliminary results with those from the literature on your topic for a comparative assessment. If you are not able to produce preliminary results, provide a cursory literature review that includes 2 sources that present and describes their validation. With more time and project support, estimate what an ideal outcome looks like in terms of model validation.

Step 3. Record your presentation

Record your eight to twelve minute long presentation and provide a link to it via your zoom or panopto folder on your GitHub index. Using any combined screen and audio recording device is perfectly acceptable (Mac, Windows, Linux). Send your link to me as a message in slack in advance of the August 8th deadline.

DATA 310 - Applied Machine Learning

For Thursday's class, reproduce the script on Basic Text Classification. Please read through the script and confirm you are able to execute a basic inferential sentiment analysis.

For tomorrow, read and produce the script on Overfit and underfit in preparation for class.

For tomorrow, read and produce the script on Custom training: walkthrough in preparation for class.

For tomorrow, read and produce the script on Classify structured data with feature columns in preparation for class.

For tomorrow, read and produce the script on Classify structured data using Keras Preprocessing Layers in preparation for class.

Stretch goal: Using the the Deep Dream script, produce another version of your artwork.

For tomorrow, we will work on producing the script from the tensorflow exercise Text generation with an RNN during class in continued preparation for Project 4.

Stretch goal: Add the Advanced:Customized Training procedure at the end of the exercise by specifying a starting point and implementing curriculum learning.