INDEX
Explanations
the repeated use of the word "who," indicating a focus on individuals and their identities
New Auto-Interp
Negative Logits
atic
-0.15
yen
-0.15
tn
-0.15
asn
-0.15
anken
-0.14
kan
-0.14
SURE
-0.13
illes
-0.13
ubi
-0.13
gaard
-0.13
POSITIVE LOGITS
oping
0.20
upon
0.16
OSH
0.15
osh
0.14
hind
0.14
////////////////////////////////////////////////////////////
0.14
akin
0.14
oser
0.13
endale
0.13
’ve
0.13
Activations Density 0.130%