INDEX
Explanations
references to U.S. political subjects or entities
New Auto-Interp
Negative Logits
unheard
-0.77
Carbuncle
-0.69
fold
-0.69
xual
-0.67
agos
-0.64
advant
-0.61
sqor
-0.61
Ital
-0.61
Transcript
-0.60
raise
-0.60
POSITIVE LOGITS
scientific
0.70
ng
0.70
Population
0.67
アル
0.64
shores
0.62
lawmakers
0.62
legislators
0.62
DN
0.61
advertising
0.61
circles
0.60
Activations Density 0.036%