INDEX
Explanations
words referring to specific people or characters
the word "whom" in various contexts indicating relationships or references to people
New Auto-Interp
Negative Logits
pad
-0.72
roads
-0.71
termin
-0.67
reth
-0.65
construct
-0.63
bard
-0.61
ply
-0.61
ricks
-0.61
starting
-0.60
Tokens
-0.60
POSITIVE LOGITS
soever
2.05
whom
0.74
izens
0.72
Allaah
0.71
nown
0.71
owship
0.69
selves
0.67
edIn
0.66
Pulitzer
0.65
dearly
0.64
Activations Density 0.020%