Autocoder.code_sentiment
[source]
Autocoder.code_sentiment
(docs
,df
,batch_size
=8
,binarize
=False
,threshold
=0.5
)
Autocodes text for positive or negative sentiment
Let's prepare a toy dataset:
ac = Autocoder()
reviews = ["I loved this doctor!", "This doctor was absolutely terrible."]
df = pd.DataFrame({
'gender': ['female', 'male'],
'review' : reviews,
})
df.head()
gender | review | |
---|---|---|
0 | female | I loved this doctor! |
1 | male | This doctor was absolutely terrible. |
After autocoding for sentiment, the dataframe now has extra columns:
result_df = ac.code_sentiment(df['review'].values, df)
result_df.head()
gender | review | negative | positive | |
---|---|---|---|---|
0 | female | I loved this doctor! | 0.005034 | 0.994966 |
1 | male | This doctor was absolutely terrible. | 0.981789 | 0.018211 |
assert result_df[result_df['gender']=='female']['negative'].values[0] < 0.1
assert result_df[result_df['gender']=='female']['positive'].values[0] > 0.9
assert result_df[result_df['gender']=='male']['negative'].values[0] > 0.9
assert result_df[result_df['gender']=='male']['positive'].values[0] < 0.1
Autocoder.code_custom_topics
[source]
Autocoder.code_custom_topics
(docs
,df
,labels
,batch_size
=8
,binarize
=False
,threshold
=0.5
)
Autocodes text for user-specified topics.
The label
field is the name of the topic as a string (or a list of them.)
Let's prepare a toy dataset:
comments = ["What is your favorite sitcom of all time?", 'I cannot wait to vote!']
df = pd.DataFrame({
'over_18': ['yes', 'no'],
'comments' : comments,
})
df.head()
over_18 | comments | |
---|---|---|
0 | yes | What is your favorite sitcom of all time? |
1 | no | I cannot wait to vote! |
After autocoding, the dataframe has a new column for each custom topic:
result_df = ac.code_custom_topics(df['comments'].values, df, labels=['television', 'film', 'politics'])
result_df.head()
over_18 | comments | television | film | politics | |
---|---|---|---|---|---|
0 | yes | What is your favorite sitcom of all time? | 0.981327 | 0.012260 | 0.000157 |
1 | no | I cannot wait to vote! | 0.000518 | 0.004943 | 0.936988 |
assert result_df[result_df['over_18']=='yes']['television'].values[0] > 0.9
assert result_df[result_df['over_18']=='yes']['film'].values[0] < 0.1
assert result_df[result_df['over_18']=='yes']['politics'].values[0] < 0.1
assert result_df[result_df['over_18']=='no']['television'].values[0] < 0.1
assert result_df[result_df['over_18']=='no']['film'].values[0] < 0.1
assert result_df[result_df['over_18']=='no']['politics'].values[0] > 0.9
Autocoder.code_emotion
[source]
Autocoder.code_emotion
(docs
,df
,batch_size
=8
,binarize
=False
,threshold
=0.5
)
Autocodes text for emotion
comments = ["I'm nervous about tomorrow.", 'I got a promotion at work!',
"My best friend was in a car accident.", "I hate it when I'm cut off in traffic."]
df = pd.DataFrame({
'over_18': ['yes', 'no', 'yes', 'yes'],
'comments' : comments,
})
df.head()
over_18 | comments | |
---|---|---|
0 | yes | I'm nervous about tomorrow. |
1 | no | I got a promotion at work! |
2 | yes | My best friend was in a car accident. |
3 | yes | I hate it when I'm cut off in traffic. |
result_df = ac.code_emotion(df['comments'].values, df, binarize=True)
result_df.head()
over_18 | comments | joy | anger | fear | sadness | |
---|---|---|---|---|---|---|
0 | yes | I'm nervous about tomorrow. | 0 | 0 | 1 | 0 |
1 | no | I got a promotion at work! | 1 | 0 | 0 | 0 |
2 | yes | My best friend was in a car accident. | 0 | 0 | 0 | 1 |
3 | yes | I hate it when I'm cut off in traffic. | 0 | 1 | 0 | 0 |
assert result_df.iloc[0]['fear'] == 1
assert result_df.iloc[1]['joy'] == 1
assert result_df.iloc[2]['sadness'] == 1
assert result_df.iloc[3]['anger'] == 1
Autocoder.code_transformer
[source]
Autocoder.code_transformer
(docs
,df
,batch_size
=32
,model_name
='stsb-roberta-large'
,show_progress_bar
=False
)
Encode texts as semantically meaningful vectors using a Transformer model
reviews = ["I loved this doctor!", "This doctor was absolutely terrible."]
df = pd.DataFrame({
'gender': ['female', 'male'],
'review' : reviews,
})
df.head()
gender | review | |
---|---|---|
0 | female | I loved this doctor! |
1 | male | This doctor was absolutely terrible. |
df = ac.code_transformer(df.review.values, df)
df.head()
gender | review | e_0000 | e_0001 | e_0002 | e_0003 | e_0004 | e_0005 | e_0006 | e_0007 | e_0008 | e_0009 | e_0010 | e_0011 | e_0012 | e_0013 | e_0014 | e_0015 | e_0016 | e_0017 | e_0018 | e_0019 | e_0020 | e_0021 | e_0022 | e_0023 | e_0024 | e_0025 | e_0026 | e_0027 | e_0028 | e_0029 | e_0030 | e_0031 | e_0032 | e_0033 | e_0034 | e_0035 | e_0036 | e_0037 | e_0038 | e_0039 | e_0040 | e_0041 | e_0042 | e_0043 | e_0044 | e_0045 | e_0046 | e_0047 | e_0048 | e_0049 | e_0050 | e_0051 | e_0052 | e_0053 | e_0054 | e_0055 | e_0056 | e_0057 | e_0058 | e_0059 | e_0060 | e_0061 | e_0062 | e_0063 | e_0064 | e_0065 | e_0066 | e_0067 | e_0068 | e_0069 | e_0070 | e_0071 | e_0072 | e_0073 | e_0074 | e_0075 | e_0076 | e_0077 | e_0078 | e_0079 | e_0080 | e_0081 | e_0082 | e_0083 | e_0084 | e_0085 | e_0086 | e_0087 | e_0088 | e_0089 | e_0090 | e_0091 | e_0092 | e_0093 | e_0094 | e_0095 | e_0096 | e_0097 | e_0098 | e_0099 | e_0100 | e_0101 | e_0102 | e_0103 | e_0104 | e_0105 | e_0106 | e_0107 | e_0108 | e_0109 | e_0110 | e_0111 | e_0112 | e_0113 | e_0114 | e_0115 | e_0116 | e_0117 | e_0118 | e_0119 | e_0120 | e_0121 | e_0122 | e_0123 | e_0124 | e_0125 | e_0126 | e_0127 | e_0128 | e_0129 | e_0130 | e_0131 | e_0132 | e_0133 | e_0134 | e_0135 | e_0136 | e_0137 | e_0138 | e_0139 | e_0140 | e_0141 | e_0142 | e_0143 | e_0144 | e_0145 | e_0146 | e_0147 | e_0148 | e_0149 | e_0150 | e_0151 | e_0152 | e_0153 | e_0154 | e_0155 | e_0156 | e_0157 | e_0158 | e_0159 | e_0160 | e_0161 | e_0162 | e_0163 | e_0164 | e_0165 | e_0166 | e_0167 | e_0168 | e_0169 | e_0170 | e_0171 | e_0172 | e_0173 | e_0174 | e_0175 | e_0176 | e_0177 | e_0178 | e_0179 | e_0180 | e_0181 | e_0182 | e_0183 | e_0184 | e_0185 | e_0186 | e_0187 | e_0188 | e_0189 | e_0190 | e_0191 | e_0192 | e_0193 | e_0194 | e_0195 | e_0196 | e_0197 | e_0198 | e_0199 | e_0200 | e_0201 | e_0202 | e_0203 | e_0204 | e_0205 | e_0206 | e_0207 | e_0208 | e_0209 | e_0210 | e_0211 | e_0212 | e_0213 | e_0214 | e_0215 | e_0216 | e_0217 | e_0218 | e_0219 | e_0220 | e_0221 | e_0222 | e_0223 | e_0224 | e_0225 | e_0226 | e_0227 | e_0228 | e_0229 | e_0230 | e_0231 | e_0232 | e_0233 | e_0234 | e_0235 | e_0236 | e_0237 | e_0238 | e_0239 | e_0240 | e_0241 | e_0242 | e_0243 | e_0244 | e_0245 | e_0246 | e_0247 | ... | e_0774 | e_0775 | e_0776 | e_0777 | e_0778 | e_0779 | e_0780 | e_0781 | e_0782 | e_0783 | e_0784 | e_0785 | e_0786 | e_0787 | e_0788 | e_0789 | e_0790 | e_0791 | e_0792 | e_0793 | e_0794 | e_0795 | e_0796 | e_0797 | e_0798 | e_0799 | e_0800 | e_0801 | e_0802 | e_0803 | e_0804 | e_0805 | e_0806 | e_0807 | e_0808 | e_0809 | e_0810 | e_0811 | e_0812 | e_0813 | e_0814 | e_0815 | e_0816 | e_0817 | e_0818 | e_0819 | e_0820 | e_0821 | e_0822 | e_0823 | e_0824 | e_0825 | e_0826 | e_0827 | e_0828 | e_0829 | e_0830 | e_0831 | e_0832 | e_0833 | e_0834 | e_0835 | e_0836 | e_0837 | e_0838 | e_0839 | e_0840 | e_0841 | e_0842 | e_0843 | e_0844 | e_0845 | e_0846 | e_0847 | e_0848 | e_0849 | e_0850 | e_0851 | e_0852 | e_0853 | e_0854 | e_0855 | e_0856 | e_0857 | e_0858 | e_0859 | e_0860 | e_0861 | e_0862 | e_0863 | e_0864 | e_0865 | e_0866 | e_0867 | e_0868 | e_0869 | e_0870 | e_0871 | e_0872 | e_0873 | e_0874 | e_0875 | e_0876 | e_0877 | e_0878 | e_0879 | e_0880 | e_0881 | e_0882 | e_0883 | e_0884 | e_0885 | e_0886 | e_0887 | e_0888 | e_0889 | e_0890 | e_0891 | e_0892 | e_0893 | e_0894 | e_0895 | e_0896 | e_0897 | e_0898 | e_0899 | e_0900 | e_0901 | e_0902 | e_0903 | e_0904 | e_0905 | e_0906 | e_0907 | e_0908 | e_0909 | e_0910 | e_0911 | e_0912 | e_0913 | e_0914 | e_0915 | e_0916 | e_0917 | e_0918 | e_0919 | e_0920 | e_0921 | e_0922 | e_0923 | e_0924 | e_0925 | e_0926 | e_0927 | e_0928 | e_0929 | e_0930 | e_0931 | e_0932 | e_0933 | e_0934 | e_0935 | e_0936 | e_0937 | e_0938 | e_0939 | e_0940 | e_0941 | e_0942 | e_0943 | e_0944 | e_0945 | e_0946 | e_0947 | e_0948 | e_0949 | e_0950 | e_0951 | e_0952 | e_0953 | e_0954 | e_0955 | e_0956 | e_0957 | e_0958 | e_0959 | e_0960 | e_0961 | e_0962 | e_0963 | e_0964 | e_0965 | e_0966 | e_0967 | e_0968 | e_0969 | e_0970 | e_0971 | e_0972 | e_0973 | e_0974 | e_0975 | e_0976 | e_0977 | e_0978 | e_0979 | e_0980 | e_0981 | e_0982 | e_0983 | e_0984 | e_0985 | e_0986 | e_0987 | e_0988 | e_0989 | e_0990 | e_0991 | e_0992 | e_0993 | e_0994 | e_0995 | e_0996 | e_0997 | e_0998 | e_0999 | e_1000 | e_1001 | e_1002 | e_1003 | e_1004 | e_1005 | e_1006 | e_1007 | e_1008 | e_1009 | e_1010 | e_1011 | e_1012 | e_1013 | e_1014 | e_1015 | e_1016 | e_1017 | e_1018 | e_1019 | e_1020 | e_1021 | e_1022 | e_1023 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | female | I loved this doctor! | -0.601180 | 0.639239 | -1.060369 | -0.493731 | -0.560601 | -1.008939 | -0.598373 | -0.672984 | -0.640709 | 0.035109 | -0.394858 | 1.125174 | -0.809709 | 0.092503 | -1.561161 | -0.338891 | -0.980971 | -0.218150 | -0.770218 | 0.518710 | -0.154178 | -0.465516 | -0.636097 | 0.136777 | -0.671058 | 0.887400 | 1.150700 | -0.255780 | -0.124600 | -1.695019 | -0.176871 | -0.554525 | 0.420271 | 1.104315 | -0.662254 | -1.104489 | -0.150348 | -0.328107 | -0.265295 | -0.232560 | -0.732200 | 0.102851 | 1.920283 | 0.345062 | 0.727855 | -0.558262 | -0.727879 | 0.068228 | -0.288561 | -1.376903 | 0.480348 | -0.951236 | -0.184960 | -0.977992 | -0.494253 | -0.142820 | 0.186124 | 0.165433 | -0.054685 | 0.401775 | -0.606251 | -0.400375 | 0.273657 | -0.347373 | 0.430465 | 0.691614 | -0.515043 | -0.089149 | 0.224054 | -0.449324 | -0.194017 | 0.594868 | -0.614699 | -0.372429 | -0.152741 | -0.066052 | 1.074707 | -0.810009 | 0.675266 | -0.609482 | 0.561731 | -0.939348 | -0.691044 | -0.995084 | 0.166328 | -1.531809 | -0.379524 | -0.498860 | -0.741533 | -0.413629 | 1.733109 | -0.791184 | -0.098716 | -1.233320 | 0.137790 | 0.938824 | 0.544055 | -1.024858 | 0.578154 | -0.508842 | -1.023441 | 0.597845 | 1.085201 | -1.700814 | -0.930898 | 0.512371 | -1.246665 | -0.310088 | 0.550669 | -1.052263 | 0.829993 | -0.637790 | -0.438172 | -0.568537 | 0.722001 | -0.957278 | -0.768909 | -0.160705 | 1.836634 | -0.581477 | 0.488977 | 0.347504 | 0.783655 | 0.589048 | -0.770469 | 0.439723 | -0.408767 | 0.295209 | 1.149268 | 0.160561 | 0.342767 | -1.275258 | -0.075461 | 0.347347 | -1.197512 | -1.346758 | 0.052439 | -1.996378 | 0.061255 | -0.809439 | -0.636264 | -0.521608 | 0.209666 | 1.201379 | 1.304154 | 0.858928 | 1.373042 | 0.723125 | -0.444027 | 0.397904 | -1.185389 | 0.309025 | 0.101140 | 0.790087 | -0.622007 | -0.557396 | -1.449296 | -0.310137 | 1.294056 | 0.66767 | -1.077920 | -0.054805 | -0.571364 | 1.299067 | -0.331780 | -0.840044 | 1.282067 | 0.425645 | 1.46890 | -0.662942 | 0.312071 | 1.420856 | 0.084983 | 0.438224 | -0.310173 | -0.981818 | 0.668649 | -1.796632 | -0.476523 | 0.171581 | 0.08128 | -1.055869 | 0.731145 | 0.082770 | 0.402360 | -0.111507 | 1.052606 | 0.101429 | -0.436716 | -0.689745 | -0.359305 | -0.849818 | 0.102386 | -0.674699 | -0.632386 | 0.635284 | -0.454286 | 0.002086 | -0.698927 | -1.261298 | 0.795101 | -0.073547 | -0.325837 | 0.421853 | -1.620993 | 1.901134 | -0.371985 | -1.075006 | 0.779401 | -0.981726 | 1.718573 | -0.156533 | -1.501477 | 0.638842 | -0.603821 | -0.441458 | -0.419934 | 1.299583 | -0.329041 | 0.187053 | 1.476716 | 0.841890 | 1.378884 | 1.415993 | 0.490228 | 0.93683 | -1.134727 | -1.298774 | -0.237284 | -0.639338 | -0.062777 | -0.571427 | -0.696611 | -1.674279 | 0.200118 | 0.566758 | 1.258007 | 0.281263 | -0.227386 | 0.403024 | -0.913720 | -0.332624 | -1.145163 | -1.373416 | 0.726468 | -0.116224 | -1.080073 | 1.629549 | ... | -0.597258 | 0.473389 | -0.087902 | -0.734512 | -0.192177 | 1.098324 | 0.252797 | -0.220380 | 0.970834 | 0.379641 | 0.702579 | 0.312840 | -0.014865 | 0.076790 | -0.926711 | 0.283459 | -0.201210 | -1.507544 | 1.013160 | 0.399853 | -0.560346 | -0.432460 | 0.738794 | 0.271019 | 0.758012 | 0.104948 | 0.032012 | -1.118263 | 0.817341 | -0.134954 | -0.367428 | -1.095511 | 1.424716 | -0.45837 | -1.005259 | 1.168612 | -0.739624 | -0.778042 | -0.356735 | 0.470458 | 0.181306 | 0.867469 | -0.033199 | -0.059742 | 0.067898 | -0.396584 | 1.678158 | -0.886795 | 0.431772 | 0.239491 | -0.398206 | 0.357574 | -0.649486 | 0.884956 | 0.774565 | -0.091967 | 0.539807 | -0.098839 | 0.407467 | 0.022493 | 0.596556 | -2.279631 | -1.012586 | -0.515414 | 1.008494 | 0.024449 | 0.786387 | -0.039095 | -0.282467 | 1.210615 | 0.009027 | 0.694995 | -0.778203 | -0.434733 | -0.546121 | 0.111783 | -0.414437 | -0.186292 | -0.924311 | 0.771270 | -0.726940 | -0.002945 | -0.904097 | -0.78010 | -1.344393 | 0.419025 | 0.236579 | -0.147506 | 0.422931 | 0.268999 | -1.120625 | -2.346339 | 0.059263 | 0.432407 | -0.029169 | 0.342242 | -0.227718 | 0.429898 | -0.487460 | 0.215381 | -1.755592 | 0.571806 | 1.145492 | -0.595226 | 0.279368 | -1.833523 | -0.318555 | -0.334240 | 1.546089 | 0.996179 | 0.365355 | 0.795756 | 0.931366 | -1.328836 | 2.221819 | 0.533793 | 0.419647 | 0.607096 | 1.148281 | 0.962832 | -0.627507 | 0.023852 | -0.977026 | 0.372186 | -0.191951 | -0.261494 | 1.279736 | 0.743437 | 0.312943 | 0.249434 | -1.020184 | -0.526093 | -0.145118 | -1.224916 | 0.013893 | 0.314860 | -0.184937 | -0.325164 | 1.366373 | 0.274657 | 0.026925 | -0.244764 | -0.087459 | 2.440723 | -0.211444 | 1.791491 | -1.783760 | 1.172868 | -1.588579 | 0.547428 | 1.236403 | 0.238765 | 1.074080 | 0.971804 | 1.481358 | -0.260144 | -0.372862 | -1.668835 | 0.814127 | 0.459048 | -0.537239 | -1.363500 | -1.937048 | 0.223611 | -0.093947 | 0.206138 | 1.323856 | -0.881426 | 0.858833 | -0.481818 | -1.63406 | 1.143431 | -0.822667 | -0.389236 | 0.754676 | -0.474368 | 1.164978 | -1.249432 | 0.841197 | -0.271101 | 0.239336 | -0.874708 | -0.484608 | 1.776312 | -0.655398 | -0.595401 | 1.292877 | -0.673088 | 1.183725 | 1.045448 | -0.711501 | -0.435948 | -0.414408 | -0.82087 | 0.125983 | 0.092412 | 0.571426 | -1.369650 | 0.498595 | -0.114022 | -2.056757 | -0.606038 | -0.014727 | -1.732948 | -0.208160 | -0.257968 | 0.336272 | 0.292738 | -1.020895 | 0.707942 | -0.413066 | 0.015892 | -0.870656 | 0.356665 | -1.240625 | 0.697207 | -0.899096 | -0.546283 | 1.346067 | 0.151549 | 0.608179 | -0.642331 | -0.491367 | 1.476060 | -0.239341 | 0.210075 | 0.653871 | 0.124511 | -1.450796 | 0.131711 | 0.597644 | -0.239655 | 0.151939 | -0.989297 | 1.120132 | 0.086377 | 0.172451 | -1.515352 | -0.422561 | 1.618894 | 1.162732 | -0.041656 | -0.473772 | 0.420647 | -0.482861 | 0.206311 | -0.806356 | 0.864795 | -0.179643 | -0.095540 |
1 | male | This doctor was absolutely terrible. | -1.080321 | 1.283710 | 0.032944 | -0.505388 | -0.632284 | 0.240779 | 0.497700 | 0.061434 | -0.951467 | -1.099914 | 0.371787 | 1.267668 | -0.751966 | -0.042724 | -0.142016 | 0.127234 | -0.733424 | -1.139796 | -0.325070 | 0.430322 | -0.098004 | 1.163077 | 1.057190 | 0.532064 | -0.054028 | -0.344783 | 1.042196 | 0.132536 | 0.173455 | -0.846880 | -0.294927 | -1.092173 | -0.739157 | 0.072505 | -1.381498 | -0.039768 | -0.596037 | -0.635421 | -0.102166 | -0.223891 | -0.110668 | 1.610051 | 0.124495 | 0.262522 | 0.471182 | 0.363986 | 0.149284 | 1.757610 | -0.095173 | -0.828335 | -0.169187 | -0.167354 | 0.181549 | -0.468074 | 0.173165 | -0.151472 | 0.153541 | -0.070349 | -0.070682 | 1.346813 | 0.838431 | -0.173599 | -0.698330 | -0.907078 | 0.686929 | -0.253123 | -0.253507 | -0.816285 | 0.577228 | -0.471222 | -0.319503 | 0.318208 | -1.152313 | 1.608094 | 0.020386 | 0.240881 | 1.051513 | -0.431564 | -0.734053 | 0.355924 | -0.735063 | -1.024491 | -0.607373 | -0.363772 | -1.032262 | -0.755497 | -1.072544 | -0.330346 | 0.112159 | -0.765853 | 2.702498 | -0.059790 | -2.331072 | -0.261409 | 0.662297 | -0.134803 | 2.094935 | -1.216020 | -1.468843 | -0.590109 | -0.603379 | 0.032229 | -0.734086 | -1.041735 | -0.096881 | 0.252744 | -0.755398 | -0.196471 | -0.673408 | 0.323116 | 0.485170 | 0.852233 | 0.038043 | 0.106503 | 1.900742 | -0.473968 | 0.440853 | -0.124218 | 0.818130 | -0.249900 | 0.174284 | -2.027710 | -0.841279 | -0.510334 | -1.589421 | -0.064431 | -0.204134 | 0.107323 | -0.129780 | -0.373625 | -0.085754 | -0.389158 | 0.630451 | 0.811590 | -1.157425 | -0.036667 | 0.638930 | -0.031828 | 0.162673 | -0.745701 | 0.047340 | 0.041956 | 0.455531 | 1.466353 | -0.493203 | 0.315198 | 0.956463 | 0.169743 | -0.903740 | 1.078133 | -0.639152 | 0.206805 | -1.212701 | 0.061930 | -1.587089 | 0.509692 | -0.580704 | 0.743137 | 0.439220 | 0.11038 | -1.247447 | 1.323940 | 0.404472 | 0.451868 | -1.951448 | -2.136478 | -0.824689 | 0.520747 | 0.87729 | -0.365677 | 0.608508 | 1.291322 | 0.141776 | 0.668782 | 0.493870 | -0.911925 | -0.265987 | -0.342515 | 0.059859 | -0.457266 | -0.24478 | 1.999361 | -0.012580 | 0.126561 | -0.443919 | 1.152566 | -0.219918 | -0.358424 | -0.215555 | 0.169946 | 0.193413 | 0.425413 | 0.506095 | -2.375514 | -0.682047 | -0.212779 | 0.261091 | -0.382527 | -0.423046 | 0.087569 | 0.485063 | -0.342660 | 0.455986 | 0.331639 | -1.648497 | 1.399007 | -0.594800 | 0.471352 | -0.741982 | 0.568690 | -0.537344 | 1.354499 | -1.521543 | 0.222686 | 0.505541 | -0.384466 | 0.048947 | 0.243410 | -1.003186 | 0.442602 | 1.256965 | 0.718853 | 1.458385 | 1.336809 | -1.110115 | -0.28113 | -0.021441 | 0.969155 | -0.324079 | -0.551153 | -0.346971 | -0.426813 | -0.909856 | -0.224591 | 0.519270 | 0.436378 | 0.557002 | 0.615946 | 0.307261 | -0.292611 | -0.646692 | -0.091192 | -0.124168 | 0.044792 | 0.370954 | -1.421038 | -1.321087 | 1.192953 | ... | -0.160493 | -1.280425 | -0.769862 | 0.573256 | -1.297933 | 1.492451 | 1.244544 | 0.312218 | -0.620741 | 0.367966 | 2.416998 | 2.586343 | -1.135545 | 0.896954 | 0.391467 | -0.674775 | 0.383277 | -0.950578 | 1.830727 | -1.018144 | -0.007086 | -0.491024 | 0.520239 | 0.675352 | 1.206401 | -1.113754 | -1.293386 | -0.928670 | 0.735877 | 0.426821 | -0.453119 | -0.505470 | 0.643926 | -0.40995 | -1.265347 | -0.086370 | 0.149850 | -0.014541 | 0.152579 | 0.214134 | 0.190900 | 0.483520 | -0.121119 | 0.216187 | -0.095705 | 0.484240 | -0.256438 | 0.128706 | 0.124124 | 0.442363 | -0.328852 | 0.839022 | -0.413680 | -0.218301 | 0.031112 | -0.781577 | 0.877376 | 0.426151 | 0.650736 | -0.534363 | 1.324010 | -2.276321 | -3.209808 | 0.747673 | -0.090331 | -0.794744 | 0.910227 | 0.064211 | 0.187118 | -0.292773 | -0.751870 | 0.891957 | -0.681515 | -1.061648 | -0.573387 | 0.548157 | -0.167158 | -0.570218 | -0.115314 | 0.747868 | -0.937214 | -0.019237 | -1.126545 | -0.36322 | -1.234232 | 0.423862 | -0.269932 | 0.576194 | 0.849581 | 0.444871 | -0.502688 | -1.018462 | -0.920363 | -0.202659 | -0.456458 | 1.216924 | -0.185181 | 0.486069 | 0.267084 | 0.585335 | 0.036500 | -0.048680 | 1.431088 | -0.141862 | 0.566101 | -1.238389 | 0.072949 | 0.038206 | -0.293941 | 1.536463 | 0.458766 | -0.149625 | -0.717818 | -0.079780 | 1.701869 | 0.439535 | -0.174674 | 0.958559 | -0.054750 | 0.944752 | -0.018844 | 0.701800 | -0.769989 | 0.253060 | 0.769639 | -0.607609 | 0.696354 | 0.171143 | 1.106053 | 0.268299 | -1.047965 | 0.640154 | 0.143615 | -1.105975 | -0.016227 | 0.142468 | 0.596629 | -0.452742 | -0.313863 | -0.227832 | -0.207953 | -0.843668 | -1.502774 | 1.050109 | -0.042179 | 0.633935 | -0.994892 | -0.309290 | -1.750694 | -1.035756 | -0.893423 | 0.439106 | 0.468417 | 0.332214 | 0.615565 | 0.167857 | -0.761188 | -0.513775 | -0.727299 | 0.233110 | 0.549183 | -1.956708 | -0.498497 | -0.176335 | -1.125636 | -0.663086 | -0.504846 | -0.284807 | 1.412328 | 1.304304 | -0.67363 | 1.146111 | -1.070053 | -0.598915 | 0.518672 | -0.419871 | -0.001672 | -0.915121 | 1.048180 | 1.200090 | -1.123845 | -0.956011 | -0.779801 | 1.226384 | 0.299932 | 0.497791 | -0.184537 | -0.028379 | 0.185598 | 0.613601 | -0.006552 | -0.340542 | 0.135926 | -0.15309 | -0.933908 | -0.327588 | 1.260057 | -0.727343 | -0.019971 | 0.352552 | -0.667697 | -1.120148 | -0.257728 | 0.343014 | 0.514783 | -1.494829 | -0.767745 | -0.098165 | -0.532586 | 1.300745 | 0.445362 | -0.591072 | 0.472784 | 0.128228 | -0.951936 | -0.301227 | -0.829075 | 0.356493 | 2.177831 | -0.453740 | 0.180738 | -0.366111 | 0.788271 | -0.376016 | -0.167796 | 0.945092 | 0.318102 | -0.313438 | -0.521864 | -0.804645 | 0.371298 | -0.102799 | -0.398658 | -0.674932 | 0.712733 | 0.402257 | -0.189253 | -1.744041 | 0.592453 | -0.101446 | 1.562682 | -0.446034 | -0.073316 | 0.778162 | -0.670258 | 0.576500 | -0.036422 | -0.237191 | -0.103962 | -0.018753 |
2 rows × 1026 columns
Autocoder.code_lda_topics
[source]
Autocoder.code_lda_topics
(docs
,df
,k
=10
,n_features
=10000
)
Encode texts as semantically meaningful vectors using Latent Dirichlet Alocation
comments = ["What is your favorite sitcom of all time?", 'I cannot wait to vote!']
df = pd.DataFrame({
'over_18': ['yes', 'no'] * 5,
'comments' : comments * 5,
})
df.head()
over_18 | comments | |
---|---|---|
0 | yes | What is your favorite sitcom of all time? |
1 | no | I cannot wait to vote! |
2 | yes | What is your favorite sitcom of all time? |
3 | no | I cannot wait to vote! |
4 | yes | What is your favorite sitcom of all time? |
df = ac.code_lda_topics(df['comments'].values, df)
preprocessing texts... fitting model... iteration: 1 of max_iter: 5 iteration: 2 of max_iter: 5 iteration: 3 of max_iter: 5 iteration: 4 of max_iter: 5 iteration: 5 of max_iter: 5 done. done.
df.head()
over_18 | comments | topic_0000 | topic_0001 | topic_0002 | topic_0003 | topic_0004 | topic_0005 | topic_0006 | topic_0007 | topic_0008 | topic_0009 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | yes | What is your favorite sitcom of all time? | 0.148763 | 0.093341 | 0.080723 | 0.128911 | 0.109816 | 0.084724 | 0.093611 | 0.080860 | 0.091758 | 0.087493 |
1 | no | I cannot wait to vote! | 0.085687 | 0.097749 | 0.142486 | 0.084145 | 0.086931 | 0.099608 | 0.091913 | 0.114741 | 0.093014 | 0.103728 |
2 | yes | What is your favorite sitcom of all time? | 0.148763 | 0.093341 | 0.080723 | 0.128911 | 0.109816 | 0.084724 | 0.093611 | 0.080860 | 0.091758 | 0.087493 |
3 | no | I cannot wait to vote! | 0.085687 | 0.097749 | 0.142486 | 0.084145 | 0.086931 | 0.099608 | 0.091913 | 0.114741 | 0.093014 | 0.103728 |
4 | yes | What is your favorite sitcom of all time? | 0.148763 | 0.093341 | 0.080723 | 0.128911 | 0.109816 | 0.084724 | 0.093611 | 0.080860 | 0.091758 | 0.087493 |
Autocoder.code_callable
[source]
Autocoder.code_callable
(docs
,df
,fn
)
Autocodes text for any user-specified function
The fn
parameter must be a Callable and return a dictionary for each
text in docs
where the keys are desired column names and values are scores
or probabilities.
reviews = ["I loved this doctor!", "This doctor was absolutely terrible."]
df = pd.DataFrame({
'gender': ['female', 'male'],
'review' : reviews,
})
df.head()
gender | review | |
---|---|---|
0 | female | I loved this doctor! |
1 | male | This doctor was absolutely terrible. |
def some_function(x):
val = int('terrible' in x)
return {'has_the_word_terrible?' : val}
df = ac.code_callable(df.review.values, df, some_function)
df.head()
gender | review | has_the_word_terrible? | |
---|---|---|---|
0 | female | I loved this doctor! | 0 |
1 | male | This doctor was absolutely terrible. | 1 |