INDEX
Explanations
phrases related to social or political manipulation of the masses
references to large groups of people
New Auto-Interp
Negative Logits
tein
-1.05
cer
-0.76
Acknowled
-0.72
ces
-0.71
Nig
-0.65
Chain
-0.63
mingham
-0.63
chron
-0.62
Else
-0.62
NER
-0.61
POSITIVE LOGITS
masses
0.92
ourcing
0.86
rake
0.82
hare
0.81
hari
0.75
urch
0.74
aic
0.74
ourced
0.73
ongs
0.73
ayers
0.72
Activations Density 0.015%