INDEX
Explanations
possessive pronouns and references to ownership
New Auto-Interp
Negative Logits
bé
-0.17
arness
-0.16
åīįãģ®
-0.15
isos
-0.15
fad
-0.15
orum
-0.14
bras
-0.14
коÑģÑĤи
-0.14
orro
-0.14
SCP
-0.14
POSITIVE LOGITS
mind
0.20
radar
0.18
conscience
0.17
life
0.17
pur
0.17
.scalablytyped
0.17
eyes
0.17
ears
0.17
pur
0.16
hear
0.15
Activations Density 0.165%