INDEX
Explanations
words related to specific organizations or groups
specific letter sequences or patterns that might indicate certain names or titles
New Auto-Interp
Negative Logits
Kat
-0.88
itte
-0.83
Beck
-0.80
Johnston
-0.79
jew
-0.76
Tinker
-0.76
McD
-0.74
Lennon
-0.73
iT
-0.73
Jordan
-0.72
POSITIVE LOGITS
UR
1.26
ur
1.22
ure
1.16
ural
1.14
mur
1.11
urs
1.11
ür
1.07
Mur
1.07
uri
1.02
uria
1.02
Activations Density 0.346%