INDEX
Explanations
proper nouns occurring in the text
tokens related to proper nouns or names
New Auto-Interp
Negative Logits
elig
-0.76
Bundes
-0.68
descend
-0.68
opio
-0.66
sugg
-0.65
Babel
-0.63
sqor
-0.61
dehydration
-0.61
jud
-0.61
hallucinations
-0.60
POSITIVE LOGITS
igans
0.90
nikov
0.89
vana
0.84
igham
0.80
chenko
0.79
utical
0.77
bug
0.75
ucket
0.73
grass
0.72
bugs
0.72
Activations Density 0.104%