INDEX
Explanations
names of a specific person, possibly from news articles
New Auto-Interp
Negative Logits
*/(
-0.82
cision
-0.77
derogatory
-0.66
achers
-0.64
urers
-0.63
things
-0.63
ocratic
-0.63
ework
-0.62
cipline
-0.62
chnology
-0.62
POSITIVE LOGITS
aii
0.86
Bei
0.76
Sue
0.75
Karen
0.75
Allen
0.74
Anne
0.74
Silk
0.73
Larson
0.73
Bang
0.70
Dunham
0.70
Activations Density 0.020%