INDEX
Explanations
references to gossip or rumors
New Auto-Interp
Negative Logits
adows
-0.16
arth
-0.16
bé
-0.16
ARTH
-0.15
aylor
-0.15
erna
-0.15
wire
-0.14
lıģa
-0.14
icts
-0.14
ipc
-0.14
POSITIVE LOGITS
oured
0.32
blings
0.31
our
0.29
ination
0.28
pled
0.26
bling
0.25
ours
0.25
mage
0.24
ble
0.24
inate
0.23
Activations Density 0.003%