INDEX
Explanations
pronouns referring to oneself or others
references to personal identity and relationships with others
New Auto-Interp
Negative Logits
odge
-0.67
ories
-0.66
rough
-0.62
iencies
-0.62
Safari
-0.61
FIG
-0.61
itialized
-0.60
Towns
-0.58
inctions
-0.58
1080
-0.58
POSITIVE LOGITS
personally
0.92
selves
0.83
atics
0.81
atic
0.81
dearly
0.78
self
0.78
imei
0.75
verbally
0.72
andering
0.70
atis
0.69
Activations Density 0.138%