INDEX
Explanations
a role associated with representing others, potentially in a political or official capacity
New Auto-Interp
Negative Logits
osure
-0.87
seed
-0.73
fters
-0.73
strap
-0.72
imb
-0.70
tered
-0.70
INESS
-0.69
Pound
-0.67
[|
-0.67
istically
-0.66
POSITIVE LOGITS
hips
0.96
Kislyak
0.87
clinton
0.81
atures
0.79
onse
0.78
ority
0.75
akes
0.75
OTUS
0.74
warr
0.72
orial
0.69
Activations Density 0.048%