INDEX
Explanations
instances of dialogue and punctuation indicating speech
New Auto-Interp
Negative Logits
hei
-0.15
ighted
-0.15
kle
-0.14
prit
-0.14
lawful
-0.13
riot
-0.13
vida
-0.13
OWN
-0.13
saya
-0.13
iendo
-0.13
POSITIVE LOGITS
behind
0.19
Inst
0.19
Gods
0.18
Inst
0.18
Relief
0.17
sweat
0.17
Normally
0.17
Pull
0.16
defgroup
0.16
tion
0.15
Activations Density 0.265%