INDEX
Explanations
possessive pronouns indicating ownership or relationships
New Auto-Interp
Negative Logits
atics
-0.17
uffs
-0.15
зал
-0.15
@(
-0.14
blo
-0.14
letter
-0.14
atos
-0.13
rim
-0.13
runner
-0.13
alist
-0.13
POSITIVE LOGITS
aim
0.26
goal
0.22
oret
0.17
缮
0.16
eyes
0.15
job
0.15
AIM
0.15
bark
0.15
focus
0.15
attempts
0.14
Activations Density 0.218%