INDEX
Explanations
words related to current events and political situations
New Auto-Interp
Negative Logits
ires
-0.76
ETS
-0.69
ocracy
-0.68
Pony
-0.64
ocratic
-0.62
efer
-0.60
estro
-0.59
ocrats
-0.58
eval
-0.57
onday
-0.57
POSITIVE LOGITS
chard
1.62
acle
1.49
acles
1.46
Else
1.43
nam
1.42
ifice
1.39
chid
1.38
acular
1.37
alternatively
1.34
else
1.24
Activations Density 2.000%