ac = Autocoder()reviews = ["I loved this doctor!", "This doctor was absolutely terrible."]df = pd.DataFrame({'gender': ['female', 'male'],'review' : reviews, })df.head()
gender
review
0
female
I loved this doctor!
1
male
This doctor was absolutely terrible.
After autocoding for sentiment, the dataframe now has extra columns:
Autocodes text for user-specified topics. The label field is the name of the topic as a string (or a list of them.)
Let’s prepare a toy dataset:
comments = ["What is your favorite sitcom of all time?", 'I cannot wait to vote!']df = pd.DataFrame({'over_18': ['yes', 'no'],'comments' : comments, })df.head()
over_18
comments
0
yes
What is your favorite sitcom of all time?
1
no
I cannot wait to vote!
After autocoding, the dataframe has a new column for each custom topic:
comments = ["I'm nervous about tomorrow.", 'I got a promotion at work!',"My best friend was in a car accident.", "I hate it when I'm cut off in traffic."]df = pd.DataFrame({'over_18': ['yes', 'no', 'yes', 'yes'],'comments' : comments, })df.head()
Encode texts as semantically meaningful vectors using Latent Dirichlet Alocation
comments = ["What is your favorite sitcom of all time?", 'I cannot wait to vote!']df = pd.DataFrame({'over_18': ['yes', 'no'] *5,'comments' : comments *5, })df.head()
preprocessing texts...
fitting model...
iteration: 1 of max_iter: 5
iteration: 2 of max_iter: 5
iteration: 3 of max_iter: 5
iteration: 4 of max_iter: 5
iteration: 5 of max_iter: 5
done.
done.
df.head()
over_18
comments
time|favorite|sitcom
sitcom|vote|wait
wait|vote|favorite
time|sitcom|favorite
favorite|sitcom|wait
wait|time|favorite
sitcom|favorite|vote
vote|wait|time
favorite|vote|time
vote|favorite|sitcom
0
yes
What is your favorite sitcom of all time?
0.148763
0.093341
0.080723
0.128911
0.109816
0.084724
0.093611
0.080860
0.091758
0.087493
1
no
I cannot wait to vote!
0.085687
0.097749
0.142486
0.084145
0.086931
0.099608
0.091913
0.114741
0.093014
0.103728
2
yes
What is your favorite sitcom of all time?
0.148763
0.093341
0.080723
0.128911
0.109816
0.084724
0.093611
0.080860
0.091758
0.087493
3
no
I cannot wait to vote!
0.085687
0.097749
0.142486
0.084145
0.086931
0.099608
0.091913
0.114741
0.093014
0.103728
4
yes
What is your favorite sitcom of all time?
0.148763
0.093341
0.080723
0.128911
0.109816
0.084724
0.093611
0.080860
0.091758
0.087493
Autocoder.code_callable
Autocoder.code_callable (docs, df, fn)
Autocodes text for any user-specified function The fn parameter must be a Callable and return a dictionary for each text in docs where the keys are desired column names and values are scores or probabilities.
reviews = ["I loved this doctor!", "This doctor was absolutely terrible."]df = pd.DataFrame({'gender': ['female', 'male'],'review' : reviews, })df.head()
gender
review
0
female
I loved this doctor!
1
male
This doctor was absolutely terrible.
def some_function(x): val =int('terrible'in x)return {'has_the_word_terrible?' : val}