INDEX
Explanations
proper nouns, particularly names of people and places
New Auto-Interp
Negative Logits
Jr
-0.17
rana
-0.16
ůr
-0.15
Brendan
-0.14
bcc
-0.14
jr
-0.14
ilight
-0.14
reb
-0.14
atÄĥ
-0.14
JR
-0.14
POSITIVE LOGITS
herself
0.21
/he
0.18
pher
0.16
affer
0.15
arer
0.15
pector
0.15
rencontrer
0.14
.methods
0.14
Alman
0.14
оÑģÑĤ
0.14
Activations Density 0.143%