INDEX
Explanations
occurrences of certain initials or abbreviations related to people's names
New Auto-Interp
Negative Logits
ystal
-0.17
amus
-0.16
ledi
-0.16
deniz
-0.16
ÏĮγ
-0.15
SF
-0.15
oin
-0.15
unks
-0.15
ack
-0.15
.MoveNext
-0.14
POSITIVE LOGITS
ONES
0.23
ansen
0.19
affe
0.19
eline
0.18
il
0.18
astr
0.18
aks
0.17
ones
0.17
ones
0.17
bara
0.17
Activations Density 0.024%