INDEX
Explanations
elements indicating boolean states or flags in configurations
New Auto-Interp
Negative Logits
lore
-0.18
ierce
-0.16
lot
-0.16
lor
-0.16
din
-0.16
icks
-0.16
odon
-0.15
ri
-0.15
esson
-0.15
ow
-0.15
POSITIVE LOGITS
/false
0.22
ushima
0.15
assen
0.15
vais
0.15
hetic
0.14
izoph
0.14
oplast
0.14
ongs
0.14
andid
0.14
kommen
0.14
Activations Density 0.038%