INDEX
Explanations
personal pronouns indicating possession
phrases that reference individuals and their personal attributes or actions
New Auto-Interp
Negative Logits
Reviewed
-0.80
γ
-0.77
Ïī
-0.75
Downloadha
-0.73
—-
-0.71
models
-0.70
Æ
-0.69
-0.69
ÙIJ
-0.69
needed
-0.69
POSITIVE LOGITS
wife
1.11
eldest
1.10
hobbies
1.08
nickname
1.07
father
1.05
biography
1.04
motto
1.03
surname
1.02
daughter
1.01
foray
1.00
Activations Density 0.203%