INDEX
Explanations
text that appear to be computer code or technical in nature
specific capitalized acronyms or alphanumeric sequences
New Auto-Interp
Negative Logits
derog
-0.65
mechanically
-0.63
sabotage
-0.63
instruments
-0.62
vaguely
-0.61
vacuum
-0.60
indications
-0.60
briefly
-0.58
refres
-0.58
cardboard
-0.57
POSITIVE LOGITS
ZX
1.14
XM
1.05
Iv
1.05
CN
1.01
Ct
0.99
Fu
0.99
Rh
0.98
Ry
0.98
=/
0.98
Ni
0.97
Activations Density 0.039%