INDEX
Explanations
phrases related to individuals' names
references to specific individuals mentioned in the text
New Auto-Interp
Negative Logits
orthy
-0.78
vation
-0.75
ctica
-0.74
Sussex
-0.74
phys
-0.70
itably
-0.70
ilight
-0.69
aciously
-0.69
itable
-0.68
phia
-0.66
POSITIVE LOGITS
enegger
0.78
ertodd
0.78
reluct
0.74
ham
0.71
Pengu
0.71
mann
0.70
ensen
0.70
linger
0.69
Clicker
0.66
aturdays
0.65
Activations Density 0.087%