INDEX
Explanations
proper nouns appearing in the text
mentions of a specific name or title that begins with "Der."
New Auto-Interp
Negative Logits
hetti
-0.83
INTER
-0.71
Dragonbound
-0.70
[|
-0.67
sburgh
-0.67
Samoa
-0.66
odcast
-0.65
ships
-0.64
margin
-0.63
hews
-0.63
POSITIVE LOGITS
bys
0.98
ricks
0.93
bil
0.90
Spiegel
0.88
iving
0.88
mal
0.88
rek
0.87
isively
0.86
ived
0.84
ivation
0.84
Activations Density 0.009%