INDEX
Explanations
names of people
references to specific individuals' names
New Auto-Interp
Negative Logits
isions
-0.67
itals
-0.67
lust
-0.66
prog
-0.64
sung
-0.63
heimer
-0.63
Instruments
-0.62
locks
-0.62
fell
-0.62
iate
-0.61
POSITIVE LOGITS
lication
0.85
igree
0.82
ogly
0.81
ocene
0.75
olin
0.74
atri
0.73
ele
0.72
lli
0.70
alty
0.70
ocy
0.69
Activations Density 0.098%