INDEX
Explanations
statements attributed to individuals
instances of proper nouns, particularly titles and names
New Auto-Interp
Negative Logits
livest
-0.82
wcs
-0.79
unden
-0.78
anwhile
-0.78
arily
-0.77
ioned
-0.72
cember
-0.69
destro
-0.69
charact
-0.67
encount
-0.67
POSITIVE LOGITS
Robot
0.99
Bezos
0.95
Speaker
0.94
Spock
0.94
Weasley
0.94
Snowden
0.93
Obama
0.93
Trump
0.92
McMahon
0.92
Musk
0.92
Activations Density 0.038%