INDEX
Explanations
phrases related to negation or prohibition
negations and expressions of absence or denial
New Auto-Interp
Negative Logits
INESS
-0.68
Previous
-0.67
Fields
-0.64
ALWAYS
-0.63
milliseconds
-0.63
personalities
-0.62
ARY
-0.61
Indiana
-0.59
Current
-0.58
somewhere
-0.57
POSITIVE LOGITS
vae
0.98
zu
0.97
vez
0.96
lez
0.94
tu
0.94
nda
0.93
aque
0.91
ppa
0.90
lla
0.90
kan
0.89
Activations Density 0.062%