INDEX
Explanations
the word "or" along with a high activation value
conditional phrases indicating alternatives or lack thereof
New Auto-Interp
Negative Logits
ETS
-0.71
edu
-0.69
efer
-0.67
auga
-0.67
edia
-0.64
Sorce
-0.63
horm
-0.63
imes
-0.62
aylor
-0.62
KEN
-0.62
POSITIVE LOGITS
chard
1.14
Else
1.10
nam
1.08
ifice
1.03
acles
1.00
acle
0.98
chid
0.97
else
0.93
otherwise
0.92
ific
0.91
Activations Density 0.129%