INDEX
Explanations
phrases related to controversy and public discourse
New Auto-Interp
Negative Logits
ERG
-0.71
behind
-0.67
orrect
-0.66
arcity
-0.65
ahead
-0.64
pring
-0.63
Ec
-0.60
synonymous
-0.60
donors
-0.59
Ahead
-0.57
POSITIVE LOGITS
microscope
1.08
guise
0.96
ausp
0.90
hood
0.89
radar
0.86
veil
0.84
scanner
0.82
jurisdiction
0.81
canopy
0.81
supervision
0.81
Activations Density 0.056%