INDEX
Explanations
phrases indicating contrary or qualifying statements
phrases that convey a sense of caveats or qualifications
New Auto-Interp
Negative Logits
java
-0.71
ngth
-0.66
bid
-0.66
wine
-0.65
assic
-0.65
antha
-0.65
Discussion
-0.64
bis
-0.63
nesota
-0.63
utenberg
-0.62
POSITIVE LOGITS
anymore
0.95
anything
0.86
erest
0.83
necessarily
0.81
goodbye
0.74
ERSON
0.74
anyone
0.73
anybody
0.72
exactly
0.72
terday
0.72
Activations Density 0.030%