INDEX
Explanations
instances of the word "who" and its related forms in the context of personal descriptions
New Auto-Interp
Negative Logits
abbo
-0.16
tti
-0.14
ternet
-0.14
ABEL
-0.14
igua
-0.14
ateria
-0.13
anske
-0.13
NÃį
-0.13
Responder
-0.13
irie
-0.13
POSITIVE LOGITS
himself
0.18
whom
0.16
jezd
0.15
earlier
0.15
Fach
0.15
imli
0.14
osh
0.14
008
0.13
Earlier
0.13
ìŀĦ
0.13
Activations Density 0.103%