ac = Autocoder()reviews = ["I loved this doctor!", "This doctor was absolutely terrible."]df = pd.DataFrame({'gender': ['female', 'male'],'review' : reviews, })df.head()
gender
review
0
female
I loved this doctor!
1
male
This doctor was absolutely terrible.
After autocoding for sentiment, the dataframe now has extra columns:
Autocodes text for user-specified topics. The label field is the name of the topic as a string (or a list of them.)
Let’s prepare a toy dataset:
comments = ["What is your favorite sitcom of all time?", 'I cannot wait to vote!']df = pd.DataFrame({'over_18': ['yes', 'no'],'comments' : comments, })df.head()
over_18
comments
0
yes
What is your favorite sitcom of all time?
1
no
I cannot wait to vote!
After autocoding, the dataframe has a new column for each custom topic:
comments = ["I'm nervous about tomorrow.", 'I got a promotion at work!',"My best friend was in a car accident.", "I hate it when I'm cut off in traffic."]df = pd.DataFrame({'over_18': ['yes', 'no', 'yes', 'yes'],'comments' : comments, })df.head()
Encode texts as semantically meaningful vectors using Latent Dirichlet Alocation
comments = ["What is your favorite sitcom of all time?", 'I cannot wait to vote!']df = pd.DataFrame({'over_18': ['yes', 'no'] *5,'comments' : comments *5, })df.head()
Autocodes text for any user-specified function The fn parameter must be a Callable and return a dictionary for each text in docs where the keys are desired column names and values are scores or probabilities.
reviews = ["I loved this doctor!", "This doctor was absolutely terrible."]df = pd.DataFrame({'gender': ['female', 'male'],'review' : reviews, })df.head()
gender
review
0
female
I loved this doctor!
1
male
This doctor was absolutely terrible.
def some_function(x): val =int('terrible'in x)return {'has_the_word_terrible?' : val}