INDEX
Explanations
contractions involving "does not"
negations or phrases indicating disagreement or denial
New Auto-Interp
Negative Logits
Classification
-0.68
protected
-0.65
PU
-0.64
learning
-0.63
Carth
-0.62
nearest
-0.62
Butt
-0.60
Letter
-0.60
elimination
-0.60
pockets
-0.59
POSITIVE LOGITS
't
1.64
ÃŃ
1.01
´
0.98
etsk
0.91
uts
0.91
n
0.90
ates
0.90
acio
0.90
itely
0.88
eness
0.87
Activations Density 0.124%