INDEX
Explanations
proper nouns
mentions of individuals or characters associated with specific names
New Auto-Interp
Negative Logits
eering
-0.96
ably
-0.86
OPLE
-0.85
IBLE
-0.78
ÙĦ
-0.74
د
-0.74
++++++++++++++++
-0.74
urdue
-0.71
ually
-0.67
antly
-0.66
POSITIVE LOGITS
Ny
1.13
quist
0.87
heter
0.85
borg
0.80
ota
0.78
comed
0.77
yk
0.73
Ey
0.73
seys
0.72
ody
0.72
Activations Density 0.007%