INDEX
Explanations
phrases related to different potential outcomes
New Auto-Interp
Negative Logits
ker
-0.77
nan
-0.76
cer
-0.74
ju
-0.70
ondo
-0.70
uni
-0.70
uga
-0.69
uzz
-0.69
king
-0.69
elong
-0.69
POSITIVE LOGITS
outcome
1.20
outcomes
1.02
bringer
0.82
result
0.77
Result
0.76
thereof
0.76
Winner
0.71
Orche
0.70
Cruel
0.70
winner
0.69
Activations Density 0.011%