INDEX
Explanations
relationships and personal connections
New Auto-Interp
Negative Logits
own
-0.17
own
-0.17
exo
-0.17
ones
-0.16
],[-
-0.15
Own
-0.15
idth
-0.15
Others
-0.14
ien
-0.14
union
-0.14
POSITIVE LOGITS
mine
0.46
ours
0.38
mine
0.35
Mine
0.33
hers
0.33
theirs
0.32
Mine
0.32
mines
0.30
yours
0.28
Yours
0.25
Activations Density 0.034%