INDEX
Explanations
mentions of names or title-like identifiers
New Auto-Interp
Negative Logits
oxy
-0.17
usercontent
-0.17
aux
-0.16
a
-0.15
versations
-0.15
Dead
-0.14
Bowman
-0.14
omba
-0.14
atto
-0.14
lio
-0.14
POSITIVE LOGITS
ipur
0.30
eger
0.26
egers
0.22
itle
0.21
imes
0.21
ime
0.19
Rule
0.19
quet
0.19
Ja
0.18
Ja
0.17
Activations Density 0.010%