INDEX
Explanations
the word "or" with a strong activation value
phrases indicating alternatives or choices
New Auto-Interp
Negative Logits
Pony
-0.90
ocracy
-0.76
ocrats
-0.74
ocrat
-0.72
achine
-0.71
ulhu
-0.70
ascus
-0.70
ARDS
-0.68
Rocket
-0.66
efer
-0.66
POSITIVE LOGITS
chard
1.35
Else
1.35
nam
1.29
acles
1.27
ifice
1.23
acle
1.23
nery
1.21
chid
1.14
otherwise
1.07
else
1.05
Activations Density 0.145%