INDEX
Explanations
personal pronouns and possessive adjectives refering to people
pronouns and their use in various contexts
New Auto-Interp
Negative Logits
nob
-0.65
jar
-0.64
gar
-0.60
UCHIJ
-0.60
Mand
-0.58
iting
-0.57
Ty
-0.56
Sidd
-0.56
onda
-0.55
let
-0.55
POSITIVE LOGITS
xus
0.82
urally
0.70
differently
0.66
ãĤ¦
0.66
ACTED
0.65
behav
0.65
ãĤ¼ãĤ¦ãĤ¹
0.64
²¾
0.64
EMS
0.64
Reviewer
0.63
Activations Density 0.605%