INDEX
Explanations
terms related to political figures and affiliations
abbreviations or acronyms, particularly those associated with legislative contexts
New Auto-Interp
Negative Logits
timet
-0.81
uncontroll
-0.74
maturity
-0.71
intrinsic
-0.71
tolerance
-0.71
ceilings
-0.69
blame
-0.68
margins
-0.68
rhy
-0.68
prejudice
-0.67
POSITIVE LOGITS
Washington
1.06
Portland
1.02
Seattle
1.00
Calif
0.99
Florida
0.98
California
0.97
Chicago
0.96
achusetts
0.96
Philadelphia
0.94
rament
0.92
Activations Density 0.018%