.loc[row_label, col_label] — label-based; .iloc[row_int, col_int] — integer position-based.df[df['col'] > 5] creates a mask; chain with & | ~ (not Python's and/or/not).df.groupby('col').agg({'other': 'mean'}) — split → apply → combine pattern.pd.merge(df1, df2, on='key', how='inner'). Know inner / left / outer differences.df.isna(), dropna(), fillna(val). NaN propagates through arithmetic.df['col'].apply(func) applies element-wise; df.apply(func, axis=1) applies row-wise.and/or in boolean masks — always use & and | and wrap each condition in parentheses. This is a classic exam trap.
df with columns name, score, grade, write code to get the mean score for each grade group, sorted descending.
df.groupby('grade')['score'].mean().sort_values(ascending=False)df.loc[0] and df.iloc[0] when the DataFrame index is [10, 20, 30]?
df.loc[0] raises a KeyError (0 is not a label in the index). df.iloc[0] returns the first row (index label 10).scores_df and roster_df, matching on student_id.
pd.merge(scores_df, roster_df, on='student_id', how='inner'). — any character (except newline) | \d digit | \w word char | \s whitespace* zero or more | + one or more | ? zero or one{m,n} between m and n times | ^ start of string | $ end of string[abc] character class | [^abc] negated class | (a|b) alternation() capturing group | (?:) non-capturing groupre.search(), re.findall(), re.sub(), re.fullmatch()df['col'].str.extract(r'pattern'), str.contains(), str.replace()? after to make lazy: .*?. On exams, trace through the string character-by-character.
XXX-XXX-XXXX.
\d{3}-\d{3}-\d{4}re.findall(r'\b\w{4}\b', 'the quick brown fox') return?
['quick', 'brown'] — words with exactly 4 characters. "the" has 3, "fox" has 3.alpha=0.3), jitter, hexbin, or contour plots.COUNT(*), SUM(col), AVG(col), MAX(col), MIN(col).IS NULL / IS NOT NULL, not = NULL.SELECT DISTINCT col removes duplicate values.df.groupby('col').agg() → GROUP BY + aggregation; df[df['col'] > 5] → WHERE; merge → JOIN.SELECT department, AVG(salary) AS avg_sal FROM employees GROUP BY department HAVING COUNT(*) > 5 ORDER BY avg_sal DESC;
HAVING COUNT(*) > 1 do, and why can't you replace it with WHERE COUNT(*) > 1?