INDEX
Explanations
pronouns and possessive determiners linked to a specific person
references to identity and belonging
New Auto-Interp
Negative Logits
Ö¼
-0.64
Bur
-0.62
Orig
-0.59
çīĪ
-0.58
utsche
-0.58
rule
-0.57
Eva
-0.56
giene
-0.56
æ©Ł
-0.55
scrimmage
-0.55
POSITIVE LOGITS
urated
0.84
ioned
0.78
idered
0.74
itable
0.72
amed
0.71
wiser
0.70
omorphic
0.70
ouched
0.69
fruitful
0.68
versible
0.68
Activations Density 0.619%