INDEX
Explanations
the word "fact" followed by a statement or observation
expressions of personal beliefs or statements of opinion
New Auto-Interp
Negative Logits
bowl
-0.70
rolling
-0.69
area
-0.66
apo
-0.64
ogie
-0.64
LC
-0.62
cox
-0.61
ouses
-0.61
airs
-0.60
odon
-0.60
POSITIVE LOGITS
quite
0.76
downright
0.74
worse
0.70
overest
0.65
reverse
0.63
bnb
0.62
sterdam
0.60
oln
0.59
ocry
0.59
pretty
0.59
Activations Density 0.216%