INDEX
Explanations
sentences discussing possibilities or potential courses of action
New Auto-Interp
Negative Logits
Constant
-0.64
Fighter
-0.64
feeding
-0.63
ging
-0.63
ocracy
-0.61
raphic
-0.61
Lifetime
-0.60
performing
-0.60
ciating
-0.60
Palest
-0.60
POSITIVE LOGITS
onna
1.12
hap
1.11
haps
1.09
bes
1.04
be
1.03
derive
0.89
confuse
0.88
owe
0.87
have
0.81
misunder
0.81
Activations Density 0.300%