INDEX
Explanations
prominent or high-ranking individuals in various organizations
New Auto-Interp
Negative Logits
ense
-0.63
trigger
-0.63
OUND
-0.63
pire
-0.63
\">
-0.62
fw
-0.61
arget
-0.61
avers
-0.61
laugh
-0.60
fitting
-0.59
POSITIVE LOGITS
Andrew
0.99
Theo
0.98
Ian
0.97
Tobias
0.94
Jamie
0.93
Sarah
0.93
Andy
0.93
Patrick
0.93
Brendan
0.93
Betsy
0.92
Activations Density 0.210%