INDEX
Explanations
references to personal relationships and familial connections
New Auto-Interp
Negative Logits
URITY
-0.16
Záp
-0.15
assis
-0.15
deniz
-0.15
("")]↵-0.15
EXEMPLARY
-0.14
jist
-0.14
uos
-0.14
appa
-0.14
mne
-0.14
POSITIVE LOGITS
loved
0.73
Loved
0.62
relative
0.43
relatives
0.41
relative
0.40
Relative
0.36
Relatives
0.34
Relative
0.34
loves
0.32
family
0.31
Activations Density 0.174%