INDEX
Explanations
mentions of the word "Congress"
mentions of Congress
New Auto-Interp
Negative Logits
MM
-0.72
srf
-0.65
Sussex
-0.63
âķIJâķIJ
-0.63
imilar
-0.62
Britann
-0.62
999
-0.61
Var
-0.61
jiang
-0.60
Vertical
-0.60
POSITIVE LOGITS
woman
1.27
ional
1.14
men
0.93
crit
0.87
ername
0.82
arians
0.81
member
0.81
appropriated
0.80
women
0.75
inition
0.74
Activations Density 0.022%