INDEX
Explanations
people's titles, especially in governmental positions
references to political institutions and figures
New Auto-Interp
Negative Logits
dotted
-0.66
dots
-0.64
plet
-0.61
fall
-0.60
destruct
-0.60
skirm
-0.59
jurors
-0.58
lanes
-0.58
fman
-0.57
discrepancy
-0.56
POSITIVE LOGITS
Philippe
0.89
Riy
0.89
Boris
0.85
Nikki
0.82
Salman
0.81
Denis
0.80
Rupert
0.79
Ruth
0.79
Theresa
0.77
Leo
0.77
Activations Density 0.247%