Auto Coder

Automatically codes text fields such as open-ended survey questions based on lingustic properties such as topic and sentiment.

source

Autocoder

 Autocoder (verbose=1, device=None)

Autocodes text fields


source

Autocoder.code_sentiment

 Autocoder.code_sentiment (docs, df, batch_size=8, binarize=False,
                           threshold=0.5)

Autocodes text for positive or negative sentiment

Let’s prepare a toy dataset:

ac = Autocoder()
reviews = ["I loved this doctor!", "This doctor was absolutely terrible."]
df = pd.DataFrame({
    'gender': ['female', 'male'],
     'review' : reviews,
      })
df.head()
gender review
0 female I loved this doctor!
1 male This doctor was absolutely terrible.

After autocoding for sentiment, the dataframe now has extra columns:

result_df = ac.code_sentiment(df['review'].values, df)
result_df.head()
gender review negative positive
0 female I loved this doctor! 0.005034 0.994966
1 male This doctor was absolutely terrible. 0.981789 0.018211
assert result_df[result_df['gender']=='female']['negative'].values[0] < 0.1
assert result_df[result_df['gender']=='female']['positive'].values[0] > 0.9
assert result_df[result_df['gender']=='male']['negative'].values[0] > 0.9
assert result_df[result_df['gender']=='male']['positive'].values[0] < 0.1

source

Autocoder.code_custom_topics

 Autocoder.code_custom_topics (docs, df, labels, batch_size=8,
                               binarize=False, threshold=0.5)

Autocodes text for user-specified topics. The label field is the name of the topic as a string (or a list of them.)

Let’s prepare a toy dataset:

comments = ["What is your favorite sitcom of all time?", 'I cannot wait to vote!']
df = pd.DataFrame({
    'over_18': ['yes', 'no'],
     'comments' : comments,
      })
df.head()
over_18 comments
0 yes What is your favorite sitcom of all time?
1 no I cannot wait to vote!

After autocoding, the dataframe has a new column for each custom topic:

result_df = ac.code_custom_topics(df['comments'].values, df, labels=['television', 'film', 'politics'])
result_df.head()
over_18 comments television film politics
0 yes What is your favorite sitcom of all time? 0.981327 0.012260 0.000157
1 no I cannot wait to vote! 0.000518 0.004943 0.936988
assert result_df[result_df['over_18']=='yes']['television'].values[0] > 0.9
assert result_df[result_df['over_18']=='yes']['film'].values[0] < 0.1
assert result_df[result_df['over_18']=='yes']['politics'].values[0] < 0.1
assert result_df[result_df['over_18']=='no']['television'].values[0] < 0.1
assert result_df[result_df['over_18']=='no']['film'].values[0] < 0.1
assert result_df[result_df['over_18']=='no']['politics'].values[0] > 0.9

source

Autocoder.code_emotion

 Autocoder.code_emotion (docs, df, batch_size=8, binarize=False,
                         threshold=0.5)

Autocodes text for emotion

comments = ["I'm nervous about tomorrow.", 'I got a promotion at work!',
            "My best friend was in a car accident.", "I hate it when I'm cut off in traffic."]
df = pd.DataFrame({
    'over_18': ['yes', 'no', 'yes', 'yes'],
     'comments' : comments,
      })
df.head()
over_18 comments
0 yes I'm nervous about tomorrow.
1 no I got a promotion at work!
2 yes My best friend was in a car accident.
3 yes I hate it when I'm cut off in traffic.
result_df = ac.code_emotion(df['comments'].values, df, binarize=True)
result_df.head()
over_18 comments joy anger fear sadness
0 yes I'm nervous about tomorrow. 0 0 1 0
1 no I got a promotion at work! 1 0 0 0
2 yes My best friend was in a car accident. 0 0 0 1
3 yes I hate it when I'm cut off in traffic. 0 1 0 0
assert result_df.iloc[0]['fear'] == 1
assert result_df.iloc[1]['joy'] == 1
assert result_df.iloc[2]['sadness'] == 1
assert result_df.iloc[3]['anger'] == 1

source

Autocoder.code_transformer

 Autocoder.code_transformer (docs, df, batch_size=32, model_name='stsb-
                             roberta-large', show_progress_bar=False)

Encode texts as semantically meaningful vectors using a Transformer model

reviews = ["I loved this doctor!", "This doctor was absolutely terrible."]
df = pd.DataFrame({
    'gender': ['female', 'male'],
     'review' : reviews,
      })
df.head()
gender review
0 female I loved this doctor!
1 male This doctor was absolutely terrible.
df = ac.code_transformer(df.review.values, df)
df.head()
gender review e_0000 e_0001 e_0002 e_0003 e_0004 e_0005 e_0006 e_0007 e_0008 e_0009 e_0010 e_0011 e_0012 e_0013 e_0014 e_0015 e_0016 e_0017 e_0018 e_0019 e_0020 e_0021 e_0022 e_0023 e_0024 e_0025 e_0026 e_0027 e_0028 e_0029 e_0030 e_0031 e_0032 e_0033 e_0034 e_0035 e_0036 e_0037 e_0038 e_0039 e_0040 e_0041 e_0042 e_0043 e_0044 e_0045 e_0046 e_0047 e_0048 e_0049 e_0050 e_0051 e_0052 e_0053 e_0054 e_0055 e_0056 e_0057 e_0058 e_0059 e_0060 e_0061 e_0062 e_0063 e_0064 e_0065 e_0066 e_0067 e_0068 e_0069 e_0070 e_0071 e_0072 e_0073 e_0074 e_0075 e_0076 e_0077 e_0078 e_0079 e_0080 e_0081 e_0082 e_0083 e_0084 e_0085 e_0086 e_0087 e_0088 e_0089 e_0090 e_0091 e_0092 e_0093 e_0094 e_0095 e_0096 e_0097 e_0098 e_0099 e_0100 e_0101 e_0102 e_0103 e_0104 e_0105 e_0106 e_0107 e_0108 e_0109 e_0110 e_0111 e_0112 e_0113 e_0114 e_0115 e_0116 e_0117 e_0118 e_0119 e_0120 e_0121 e_0122 e_0123 e_0124 e_0125 e_0126 e_0127 e_0128 e_0129 e_0130 e_0131 e_0132 e_0133 e_0134 e_0135 e_0136 e_0137 e_0138 e_0139 e_0140 e_0141 e_0142 e_0143 e_0144 e_0145 e_0146 e_0147 e_0148 e_0149 e_0150 e_0151 e_0152 e_0153 e_0154 e_0155 e_0156 e_0157 e_0158 e_0159 e_0160 e_0161 e_0162 e_0163 e_0164 e_0165 e_0166 e_0167 e_0168 e_0169 e_0170 e_0171 e_0172 e_0173 e_0174 e_0175 e_0176 e_0177 e_0178 e_0179 e_0180 e_0181 e_0182 e_0183 e_0184 e_0185 e_0186 e_0187 e_0188 e_0189 e_0190 e_0191 e_0192 e_0193 e_0194 e_0195 e_0196 e_0197 e_0198 e_0199 e_0200 e_0201 e_0202 e_0203 e_0204 e_0205 e_0206 e_0207 e_0208 e_0209 e_0210 e_0211 e_0212 e_0213 e_0214 e_0215 e_0216 e_0217 e_0218 e_0219 e_0220 e_0221 e_0222 e_0223 e_0224 e_0225 e_0226 e_0227 e_0228 e_0229 e_0230 e_0231 e_0232 e_0233 e_0234 e_0235 e_0236 e_0237 e_0238 e_0239 e_0240 e_0241 e_0242 e_0243 e_0244 e_0245 e_0246 e_0247 ... e_0774 e_0775 e_0776 e_0777 e_0778 e_0779 e_0780 e_0781 e_0782 e_0783 e_0784 e_0785 e_0786 e_0787 e_0788 e_0789 e_0790 e_0791 e_0792 e_0793 e_0794 e_0795 e_0796 e_0797 e_0798 e_0799 e_0800 e_0801 e_0802 e_0803 e_0804 e_0805 e_0806 e_0807 e_0808 e_0809 e_0810 e_0811 e_0812 e_0813 e_0814 e_0815 e_0816 e_0817 e_0818 e_0819 e_0820 e_0821 e_0822 e_0823 e_0824 e_0825 e_0826 e_0827 e_0828 e_0829 e_0830 e_0831 e_0832 e_0833 e_0834 e_0835 e_0836 e_0837 e_0838 e_0839 e_0840 e_0841 e_0842 e_0843 e_0844 e_0845 e_0846 e_0847 e_0848 e_0849 e_0850 e_0851 e_0852 e_0853 e_0854 e_0855 e_0856 e_0857 e_0858 e_0859 e_0860 e_0861 e_0862 e_0863 e_0864 e_0865 e_0866 e_0867 e_0868 e_0869 e_0870 e_0871 e_0872 e_0873 e_0874 e_0875 e_0876 e_0877 e_0878 e_0879 e_0880 e_0881 e_0882 e_0883 e_0884 e_0885 e_0886 e_0887 e_0888 e_0889 e_0890 e_0891 e_0892 e_0893 e_0894 e_0895 e_0896 e_0897 e_0898 e_0899 e_0900 e_0901 e_0902 e_0903 e_0904 e_0905 e_0906 e_0907 e_0908 e_0909 e_0910 e_0911 e_0912 e_0913 e_0914 e_0915 e_0916 e_0917 e_0918 e_0919 e_0920 e_0921 e_0922 e_0923 e_0924 e_0925 e_0926 e_0927 e_0928 e_0929 e_0930 e_0931 e_0932 e_0933 e_0934 e_0935 e_0936 e_0937 e_0938 e_0939 e_0940 e_0941 e_0942 e_0943 e_0944 e_0945 e_0946 e_0947 e_0948 e_0949 e_0950 e_0951 e_0952 e_0953 e_0954 e_0955 e_0956 e_0957 e_0958 e_0959 e_0960 e_0961 e_0962 e_0963 e_0964 e_0965 e_0966 e_0967 e_0968 e_0969 e_0970 e_0971 e_0972 e_0973 e_0974 e_0975 e_0976 e_0977 e_0978 e_0979 e_0980 e_0981 e_0982 e_0983 e_0984 e_0985 e_0986 e_0987 e_0988 e_0989 e_0990 e_0991 e_0992 e_0993 e_0994 e_0995 e_0996 e_0997 e_0998 e_0999 e_1000 e_1001 e_1002 e_1003 e_1004 e_1005 e_1006 e_1007 e_1008 e_1009 e_1010 e_1011 e_1012 e_1013 e_1014 e_1015 e_1016 e_1017 e_1018 e_1019 e_1020 e_1021 e_1022 e_1023
0 female I loved this doctor! -0.601180 0.639239 -1.060369 -0.493731 -0.560601 -1.008939 -0.598373 -0.672984 -0.640709 0.035109 -0.394858 1.125174 -0.809709 0.092503 -1.561161 -0.338891 -0.980971 -0.218150 -0.770218 0.518710 -0.154178 -0.465516 -0.636097 0.136777 -0.671058 0.887400 1.150700 -0.255780 -0.124600 -1.695019 -0.176871 -0.554525 0.420271 1.104315 -0.662254 -1.104489 -0.150348 -0.328107 -0.265295 -0.232560 -0.732200 0.102851 1.920283 0.345062 0.727855 -0.558262 -0.727879 0.068228 -0.288561 -1.376903 0.480348 -0.951236 -0.184960 -0.977992 -0.494253 -0.142820 0.186124 0.165433 -0.054685 0.401775 -0.606251 -0.400375 0.273657 -0.347373 0.430465 0.691614 -0.515043 -0.089149 0.224054 -0.449324 -0.194017 0.594868 -0.614699 -0.372429 -0.152741 -0.066052 1.074707 -0.810009 0.675266 -0.609482 0.561731 -0.939348 -0.691044 -0.995084 0.166328 -1.531809 -0.379524 -0.498860 -0.741533 -0.413629 1.733109 -0.791184 -0.098716 -1.233320 0.137790 0.938824 0.544055 -1.024858 0.578154 -0.508842 -1.023441 0.597845 1.085201 -1.700814 -0.930898 0.512371 -1.246665 -0.310088 0.550669 -1.052263 0.829993 -0.637790 -0.438172 -0.568537 0.722001 -0.957278 -0.768909 -0.160705 1.836634 -0.581477 0.488977 0.347504 0.783655 0.589048 -0.770469 0.439723 -0.408767 0.295209 1.149268 0.160561 0.342767 -1.275258 -0.075461 0.347347 -1.197512 -1.346758 0.052439 -1.996378 0.061255 -0.809439 -0.636264 -0.521608 0.209666 1.201379 1.304154 0.858928 1.373042 0.723125 -0.444027 0.397904 -1.185389 0.309025 0.101140 0.790087 -0.622007 -0.557396 -1.449296 -0.310137 1.294056 0.66767 -1.077920 -0.054805 -0.571364 1.299067 -0.331780 -0.840044 1.282067 0.425645 1.46890 -0.662942 0.312071 1.420856 0.084983 0.438224 -0.310173 -0.981818 0.668649 -1.796632 -0.476523 0.171581 0.08128 -1.055869 0.731145 0.082770 0.402360 -0.111507 1.052606 0.101429 -0.436716 -0.689745 -0.359305 -0.849818 0.102386 -0.674699 -0.632386 0.635284 -0.454286 0.002086 -0.698927 -1.261298 0.795101 -0.073547 -0.325837 0.421853 -1.620993 1.901134 -0.371985 -1.075006 0.779401 -0.981726 1.718573 -0.156533 -1.501477 0.638842 -0.603821 -0.441458 -0.419934 1.299583 -0.329041 0.187053 1.476716 0.841890 1.378884 1.415993 0.490228 0.93683 -1.134727 -1.298774 -0.237284 -0.639338 -0.062777 -0.571427 -0.696611 -1.674279 0.200118 0.566758 1.258007 0.281263 -0.227386 0.403024 -0.913720 -0.332624 -1.145163 -1.373416 0.726468 -0.116224 -1.080073 1.629549 ... -0.597258 0.473389 -0.087902 -0.734512 -0.192177 1.098324 0.252797 -0.220380 0.970834 0.379641 0.702579 0.312840 -0.014865 0.076790 -0.926711 0.283459 -0.201210 -1.507544 1.013160 0.399853 -0.560346 -0.432460 0.738794 0.271019 0.758012 0.104948 0.032012 -1.118263 0.817341 -0.134954 -0.367428 -1.095511 1.424716 -0.45837 -1.005259 1.168612 -0.739624 -0.778042 -0.356735 0.470458 0.181306 0.867469 -0.033199 -0.059742 0.067898 -0.396584 1.678158 -0.886795 0.431772 0.239491 -0.398206 0.357574 -0.649486 0.884956 0.774565 -0.091967 0.539807 -0.098839 0.407467 0.022493 0.596556 -2.279631 -1.012586 -0.515414 1.008494 0.024449 0.786387 -0.039095 -0.282467 1.210615 0.009027 0.694995 -0.778203 -0.434733 -0.546121 0.111783 -0.414437 -0.186292 -0.924311 0.771270 -0.726940 -0.002945 -0.904097 -0.78010 -1.344393 0.419025 0.236579 -0.147506 0.422931 0.268999 -1.120625 -2.346339 0.059263 0.432407 -0.029169 0.342242 -0.227718 0.429898 -0.487460 0.215381 -1.755592 0.571806 1.145492 -0.595226 0.279368 -1.833523 -0.318555 -0.334240 1.546089 0.996179 0.365355 0.795756 0.931366 -1.328836 2.221819 0.533793 0.419647 0.607096 1.148281 0.962832 -0.627507 0.023852 -0.977026 0.372186 -0.191951 -0.261494 1.279736 0.743437 0.312943 0.249434 -1.020184 -0.526093 -0.145118 -1.224916 0.013893 0.314860 -0.184937 -0.325164 1.366373 0.274657 0.026925 -0.244764 -0.087459 2.440723 -0.211444 1.791491 -1.783760 1.172868 -1.588579 0.547428 1.236403 0.238765 1.074080 0.971804 1.481358 -0.260144 -0.372862 -1.668835 0.814127 0.459048 -0.537239 -1.363500 -1.937048 0.223611 -0.093947 0.206138 1.323856 -0.881426 0.858833 -0.481818 -1.63406 1.143431 -0.822667 -0.389236 0.754676 -0.474368 1.164978 -1.249432 0.841197 -0.271101 0.239336 -0.874708 -0.484608 1.776312 -0.655398 -0.595401 1.292877 -0.673088 1.183725 1.045448 -0.711501 -0.435948 -0.414408 -0.82087 0.125983 0.092412 0.571426 -1.369650 0.498595 -0.114022 -2.056757 -0.606038 -0.014727 -1.732948 -0.208160 -0.257968 0.336272 0.292738 -1.020895 0.707942 -0.413066 0.015892 -0.870656 0.356665 -1.240625 0.697207 -0.899096 -0.546283 1.346067 0.151549 0.608179 -0.642331 -0.491367 1.476060 -0.239341 0.210075 0.653871 0.124511 -1.450796 0.131711 0.597644 -0.239655 0.151939 -0.989297 1.120132 0.086377 0.172451 -1.515352 -0.422561 1.618894 1.162732 -0.041656 -0.473772 0.420647 -0.482861 0.206311 -0.806356 0.864795 -0.179643 -0.095540
1 male This doctor was absolutely terrible. -1.080321 1.283710 0.032944 -0.505388 -0.632284 0.240779 0.497700 0.061434 -0.951467 -1.099914 0.371787 1.267668 -0.751966 -0.042724 -0.142016 0.127234 -0.733424 -1.139796 -0.325070 0.430322 -0.098004 1.163077 1.057190 0.532064 -0.054028 -0.344783 1.042196 0.132536 0.173455 -0.846880 -0.294927 -1.092173 -0.739157 0.072505 -1.381498 -0.039768 -0.596037 -0.635421 -0.102166 -0.223891 -0.110668 1.610051 0.124495 0.262522 0.471182 0.363986 0.149284 1.757610 -0.095173 -0.828335 -0.169187 -0.167354 0.181549 -0.468074 0.173165 -0.151472 0.153541 -0.070349 -0.070682 1.346813 0.838431 -0.173599 -0.698330 -0.907078 0.686929 -0.253123 -0.253507 -0.816285 0.577228 -0.471222 -0.319503 0.318208 -1.152313 1.608094 0.020386 0.240881 1.051513 -0.431564 -0.734053 0.355924 -0.735063 -1.024491 -0.607373 -0.363772 -1.032262 -0.755497 -1.072544 -0.330346 0.112159 -0.765853 2.702498 -0.059790 -2.331072 -0.261409 0.662297 -0.134803 2.094935 -1.216020 -1.468843 -0.590109 -0.603379 0.032229 -0.734086 -1.041735 -0.096881 0.252744 -0.755398 -0.196471 -0.673408 0.323116 0.485170 0.852233 0.038043 0.106503 1.900742 -0.473968 0.440853 -0.124218 0.818130 -0.249900 0.174284 -2.027710 -0.841279 -0.510334 -1.589421 -0.064431 -0.204134 0.107323 -0.129780 -0.373625 -0.085754 -0.389158 0.630451 0.811590 -1.157425 -0.036667 0.638930 -0.031828 0.162673 -0.745701 0.047340 0.041956 0.455531 1.466353 -0.493203 0.315198 0.956463 0.169743 -0.903740 1.078133 -0.639152 0.206805 -1.212701 0.061930 -1.587089 0.509692 -0.580704 0.743137 0.439220 0.11038 -1.247447 1.323940 0.404472 0.451868 -1.951448 -2.136478 -0.824689 0.520747 0.87729 -0.365677 0.608508 1.291322 0.141776 0.668782 0.493870 -0.911925 -0.265987 -0.342515 0.059859 -0.457266 -0.24478 1.999361 -0.012580 0.126561 -0.443919 1.152566 -0.219918 -0.358424 -0.215555 0.169946 0.193413 0.425413 0.506095 -2.375514 -0.682047 -0.212779 0.261091 -0.382527 -0.423046 0.087569 0.485063 -0.342660 0.455986 0.331639 -1.648497 1.399007 -0.594800 0.471352 -0.741982 0.568690 -0.537344 1.354499 -1.521543 0.222686 0.505541 -0.384466 0.048947 0.243410 -1.003186 0.442602 1.256965 0.718853 1.458385 1.336809 -1.110115 -0.28113 -0.021441 0.969155 -0.324079 -0.551153 -0.346971 -0.426813 -0.909856 -0.224591 0.519270 0.436378 0.557002 0.615946 0.307261 -0.292611 -0.646692 -0.091192 -0.124168 0.044792 0.370954 -1.421038 -1.321087 1.192953 ... -0.160493 -1.280425 -0.769862 0.573256 -1.297933 1.492451 1.244544 0.312218 -0.620741 0.367966 2.416998 2.586343 -1.135545 0.896954 0.391467 -0.674775 0.383277 -0.950578 1.830727 -1.018144 -0.007086 -0.491024 0.520239 0.675352 1.206401 -1.113754 -1.293386 -0.928670 0.735877 0.426821 -0.453119 -0.505470 0.643926 -0.40995 -1.265347 -0.086370 0.149850 -0.014541 0.152579 0.214134 0.190900 0.483520 -0.121119 0.216187 -0.095705 0.484240 -0.256438 0.128706 0.124124 0.442363 -0.328852 0.839022 -0.413680 -0.218301 0.031112 -0.781577 0.877376 0.426151 0.650736 -0.534363 1.324010 -2.276321 -3.209808 0.747673 -0.090331 -0.794744 0.910227 0.064211 0.187118 -0.292773 -0.751870 0.891957 -0.681515 -1.061648 -0.573387 0.548157 -0.167158 -0.570218 -0.115314 0.747868 -0.937214 -0.019237 -1.126545 -0.36322 -1.234232 0.423862 -0.269932 0.576194 0.849581 0.444871 -0.502688 -1.018462 -0.920363 -0.202659 -0.456458 1.216924 -0.185181 0.486069 0.267084 0.585335 0.036500 -0.048680 1.431088 -0.141862 0.566101 -1.238389 0.072949 0.038206 -0.293941 1.536463 0.458766 -0.149625 -0.717818 -0.079780 1.701869 0.439535 -0.174674 0.958559 -0.054750 0.944752 -0.018844 0.701800 -0.769989 0.253060 0.769639 -0.607609 0.696354 0.171143 1.106053 0.268299 -1.047965 0.640154 0.143615 -1.105975 -0.016227 0.142468 0.596629 -0.452742 -0.313863 -0.227832 -0.207953 -0.843668 -1.502774 1.050109 -0.042179 0.633935 -0.994892 -0.309290 -1.750694 -1.035756 -0.893423 0.439106 0.468417 0.332214 0.615565 0.167857 -0.761188 -0.513775 -0.727299 0.233110 0.549183 -1.956708 -0.498497 -0.176335 -1.125636 -0.663086 -0.504846 -0.284807 1.412328 1.304304 -0.67363 1.146111 -1.070053 -0.598915 0.518672 -0.419871 -0.001672 -0.915121 1.048180 1.200090 -1.123845 -0.956011 -0.779801 1.226384 0.299932 0.497791 -0.184537 -0.028379 0.185598 0.613601 -0.006552 -0.340542 0.135926 -0.15309 -0.933908 -0.327588 1.260057 -0.727343 -0.019971 0.352552 -0.667697 -1.120148 -0.257728 0.343014 0.514783 -1.494829 -0.767745 -0.098165 -0.532586 1.300745 0.445362 -0.591072 0.472784 0.128228 -0.951936 -0.301227 -0.829075 0.356493 2.177831 -0.453740 0.180738 -0.366111 0.788271 -0.376016 -0.167796 0.945092 0.318102 -0.313438 -0.521864 -0.804645 0.371298 -0.102799 -0.398658 -0.674932 0.712733 0.402257 -0.189253 -1.744041 0.592453 -0.101446 1.562682 -0.446034 -0.073316 0.778162 -0.670258 0.576500 -0.036422 -0.237191 -0.103962 -0.018753

2 rows × 1026 columns


source

Autocoder.code_lda_topics

 Autocoder.code_lda_topics (docs, df, k=10, n_features=10000)

Encode texts as semantically meaningful vectors using Latent Dirichlet Alocation

comments = ["What is your favorite sitcom of all time?", 'I cannot wait to vote!']
df = pd.DataFrame({
    'over_18': ['yes', 'no'] * 5,
     'comments' : comments * 5,
      })
df.head()
over_18 comments
0 yes What is your favorite sitcom of all time?
1 no I cannot wait to vote!
2 yes What is your favorite sitcom of all time?
3 no I cannot wait to vote!
4 yes What is your favorite sitcom of all time?
df = ac.code_lda_topics(df['comments'].values, df)
preprocessing texts...
fitting model...
iteration: 1 of max_iter: 5
iteration: 2 of max_iter: 5
iteration: 3 of max_iter: 5
iteration: 4 of max_iter: 5
iteration: 5 of max_iter: 5
done.
done.
df.head()
over_18 comments topic_0000 topic_0001 topic_0002 topic_0003 topic_0004 topic_0005 topic_0006 topic_0007 topic_0008 topic_0009
0 yes What is your favorite sitcom of all time? 0.148763 0.093341 0.080723 0.128911 0.109816 0.084724 0.093611 0.080860 0.091758 0.087493
1 no I cannot wait to vote! 0.085687 0.097749 0.142486 0.084145 0.086931 0.099608 0.091913 0.114741 0.093014 0.103728
2 yes What is your favorite sitcom of all time? 0.148763 0.093341 0.080723 0.128911 0.109816 0.084724 0.093611 0.080860 0.091758 0.087493
3 no I cannot wait to vote! 0.085687 0.097749 0.142486 0.084145 0.086931 0.099608 0.091913 0.114741 0.093014 0.103728
4 yes What is your favorite sitcom of all time? 0.148763 0.093341 0.080723 0.128911 0.109816 0.084724 0.093611 0.080860 0.091758 0.087493

source

Autocoder.code_callable

 Autocoder.code_callable (docs, df, fn)

Autocodes text for any user-specified function The fn parameter must be a Callable and return a dictionary for each text in docs where the keys are desired column names and values are scores or probabilities.

reviews = ["I loved this doctor!", "This doctor was absolutely terrible."]
df = pd.DataFrame({
    'gender': ['female', 'male'],
     'review' : reviews,
      })
df.head()
gender review
0 female I loved this doctor!
1 male This doctor was absolutely terrible.
def some_function(x):
    val = int('terrible' in x)
    return {'has_the_word_terrible?' : val}
df = ac.code_callable(df.review.values, df, some_function)
df.head()
gender review has_the_word_terrible?
0 female I loved this doctor! 0
1 male This doctor was absolutely terrible. 1