Prompt Title: Q1.2.2

Created 1 year ago

Votes 0

Views 471 times

#coding

Tech

Prompt Detail:

Here are the updated answers with explanations for questions 1.1.1 to 1.3.1:

Question 1.1.1: Find the stemmed version of the word "vegetables":

python
stemmed_message = vocab_mapping.where("Word", "vegetables").column("Stem").item(0)
stemmed_message

Explanation: The code above searches for the stemmed version of the word "vegetables" in the vocab_mapping table and assigns it to the variable stemmed_message.

Question 1.1.2: Find the stem in the dataset with the most words that are shortened to it:

python
most_stem = vocab_table.group("Stem").sort("count", descending=True).column("Stem").item(0)
most_stem

Explanation: The code above groups the vocab_table by the stem, sorts by the count of words in descending order, and then selects the stem with the most words.

Question 1.1.3: Find the longest word in the dataset whose stem wasn't shortened:

python
tbl_with_lens = vocab_table.with_columns(
    "Word Length", vocab_table.apply(len, "Word"),
    "Stem Length", vocab_table.apply(len, "Stem")
)
tbl_with_dif = tbl_with_lens.with_column("Difference", tbl_with_lens.column("Word Length") - tbl_with_lens.column("Stem Length"))
longest_uncut = tbl_with_dif.where("Difference", 0).sort("Word Length", descending=True).sort("Word", descending=True).column("Word").item(0)
longest_uncut

Explanation: The code above adds columns with the length of the word and the length of the stem, then adds a column with the difference between those lengths. Next, it filters the table where the difference is 0 (meaning the word is not shortened) and sorts by word length and alphabetically (from Z to A) to find the longest uncut word.

Question 1.2.1: Calculate the correlation coefficient for the association between the proportion of words that are "outer" and the proportion of words that are "space" for every movie in the dataset:

python
outer_su = (outer - np.mean(outer)) / np.std(outer)
space_su = (space - np.mean(space)) / np.std(space)

outer_space_r = np.mean(outer_su * space_su)
outer_space_r

Explanation: The code above calculates the standardized units for "outer" and "space" by subtracting the mean and dividing by the standard deviation element-wise. Then, it calculates the correlation coefficient by taking the mean of the product of the standardized units.

Question 1.3.1: Draw a horizontal bar chart with two bars that show the proportion of Comedy movies in each dataset:

python
def comedy_proportion(table):
    # Return the proportion of movies in a table that have the Comedy genre.
    return np.count_nonzero(table.column("Genre") == "comedy") / table.num_rows

proportions_table = Table().with_columns(
    "Dataset", ["Training", "Test"],
    "Comedy Proportion", [comedy_proportion(train_movies), comedy_proportion(test_movies)]
)

proportions_table.barh("Dataset", "Comedy Proportion")

Explanation: The comedy_proportion function returns the proportion of Comedy movies in a given table. We then create a new table with dataset names and their corresponding comedy proportions, and plot a horizontal bar chart using this table.

RJ

Shared 2 prompts

Created 1 year ago

Add a comment

Name

Email (Address never made public)

Website

Message

Related Tag Prompts

Act as a Commit Message Generator

2 years ago 2023-02-11 02:13:05 mehmetalicayhan

#programming #coding #software #development

Act as a JavaScript Console

2 years ago 2023-01-28 19:23:35 omerimzali

#code #coding #programming

392

Prompt Title: Q1.2.2

Share a link to this prompt

Leave a Comment

Related Tag Prompts

Trending Prompts

Trending Tags

Blogs

ChatGPT For Search Engines