INDEX
Explanations
names or titles, specifically those related to professions or positions
phrases indicating roles or positions within organizations or departments
New Auto-Interp
Negative Logits
aan
-0.77
mort
-0.75
BR
-0.74
llah
-0.69
JV
-0.68
gorge
-0.68
danced
-0.67
ptions
-0.65
Driver
-0.64
isphere
-0.64
POSITIVE LOGITS
geries
0.96
bidden
0.85
whom
0.83
gery
0.79
example
0.78
Horizon
0.77
Borderlands
0.77
instance
0.76
gotten
0.76
aging
0.75
Activations Density 0.087%