Prompt Title: Expected row sum: 1.

Created 1 year ago
Votes 0
Views 425 times
0

Prompt Detail:

Question 1.1.1 Assign stemmed_message to the stemmed version of the word "vegetables".

python
stemmed_message = vocab_mapping.where("Word", "vegetables").column("Stem")[0] stemmed_message

In this question, we search for the stemmed version of the word "vegetables" in the vocab_mapping table and assign it to the variable stemmed_message.

Question 1.1.2 What stem in the dataset has the most words that are shortened to it? Assign most_stem to that stem.

python
most_stem = vocab_mapping.group("Stem").sort("count", descending=True).column("Stem")[0] most_stem

In this question, we first group the vocab_mapping table by the "Stem" column and count the occurrences of each stem. Then, we sort the resulting table in descending order based on the count and select the stem with the most occurrences.

Question 1.1.3 What is the longest word in the dataset whose stem wasn't shortened? Assign that to longest_uncut. Break ties alphabetically from Z to A (so if your options are "albatross" or "batman", you should pick "batman").

python
tbl_with_lens = vocab_mapping.with_columns( "Word Length", vocab_mapping.apply(len, "Word"), "Stem Length", vocab_mapping.apply(len, "Stem") ) tbl_with_dif = tbl_with_lens.with_column( "Difference", tbl_with_lens.column("Word Length") - tbl_with_lens.column("Stem Length") ) longest_uncut = tbl_with_dif.where("Difference", 0).sort(["Word Length", "Word"], descending=True).column("Word")[0] longest_uncut

In this question, we first add columns to the vocab_mapping table containing the length of the word and the length of the stem. Then, we add another column to store the difference between the lengths. If the word is not shortened, the difference will be 0. We filter the rows where the difference is 0, sort the table by word length and word (descending), and select the longest word whose stem wasn't shortened.

Question 1.2.1 Calculate the correlation coefficient for the association between proportion of words that are "outer" and the proportion of words that are "space" for every movie in the dataset, and assign it to outer_space_r.

python
outer_su = (outer - np.mean(outer)) / np.std(outer) space_su = (space - np.mean(space)) / np.std(space) outer_space_r = np.mean(outer_su * space_su) outer_space_r

In this question, we first calculate the standardized units for the "outer" and "space" columns. Then, we compute the element-wise product of the standardized units and take the mean to find the correlation coefficient.

Question 1.2.2 Choose two different words in the dataset with a correlation higher than 0.2 or smaller than -0.2 that are not outer and space and plot a scatter plot with a line of best fit for them.

python
word_x = "chief" word_y = "execut" arr_x = movies.column(word_x) arr_y = movies.column(word_y) x_su = (arr_x - np.mean(arr_x)) / np.std(arr_x) y_su = (arr_y - np.mean(arr_y)) / np.std(arr_y) r = np.mean(x_su * y_su) slope = r * np.std(arr_y) / np.std(arr_x) intercept = np.mean(arr_y) - slope
avatar
rh
Shared 1 prompt
Created 1 year ago
  • I enjoy reading an article that will make people think. Also, thank you for allowing me to comment! - canadian prescriptions online - 4 months ago

Leave a Comment

Related Tag Prompts

0
0
HI
1 year ago 2023-03-09 12:58:46 Ha
0
0
compare YOLO
1 year ago 2023-03-10 18:44:01 vinh
0
0
jlkasd
1 year ago 2023-03-21 22:38:31 asd
0
0
노무현은 살아있어? - 사망 확인됨
1 year ago 2023-03-27 05:29:09 asd
0
0
YouTuber returns to YouTube
1 year ago 2023-04-14 19:13:59 fff
0
0
OHM Nanocomposite Slurry
1 year ago 2023-04-19 13:59:57 CHshin
0
0
Appealing Letter Tips
1 year ago 2023-04-22 12:54:51 Min
0
0
Small baseline datasets for ML.
1 year ago 2023-05-06 16:32:14 abel
0
0
Python 微信登录 KeyError
1 year ago 2023-05-11 17:23:37 bot