INDEX
Explanations
causal relationships or alternatives
phrases suggesting alternative outcomes or options
New Auto-Interp
Negative Logits
ocracy
-0.94
ETS
-0.86
Pony
-0.83
ocrats
-0.79
ords
-0.73
Tycoon
-0.70
onday
-0.69
erest
-0.68
ascus
-0.67
eth
-0.65
POSITIVE LOGITS
acles
1.22
chard
1.18
acle
1.17
nam
1.10
otherwise
1.08
chid
1.04
alternatively
1.03
ifice
1.03
acular
0.98
outright
0.94
Activations Density 0.137%