INDEX
Explanations
the word "off" and its variants in different contexts
New Auto-Interp
Negative Logits
bout
-0.15
ISP
-0.15
.override
-0.15
asha
-0.14
faction
-0.14
antino
-0.14
.googlecode
-0.14
azar
-0.14
epar
-0.14
iero
-0.14
POSITIVE LOGITS
ensively
0.17
abus
0.15
acial
0.15
enstein
0.14
ICIAL
0.14
imus
0.14
endale
0.14
usercontent
0.14
nonexistent
0.13
ices
0.13
Activations Density 0.018%