INDEX
Explanations
contractions involving 'isn't' and 'doesn't'
negations or phrases expressing disagreement
New Auto-Interp
Negative Logits
princ
-0.67
symp
-0.65
prelim
-0.63
guid
-0.61
nearest
-0.61
FANT
-0.60
Introduced
-0.59
interstitial
-0.58
passing
-0.57
pressing
-0.57
POSITIVE LOGITS
't
1.59
´
0.96
ÃŃ
0.88
uts
0.86
ned
0.86
nit
0.83
ates
0.79
n
0.79
itates
0.78
âĤ¬
0.74
Activations Density 0.127%