INDEX
Explanations
mentions of different possible outcomes or results
mentions of "outcome" indicating results or consequences
New Auto-Interp
Negative Logits
ker
-0.77
cer
-0.72
afort
-0.71
king
-0.70
Cola
-0.69
ovie
-0.68
nan
-0.67
pload
-0.67
uggage
-0.66
yi
-0.66
POSITIVE LOGITS
outcome
1.12
outcomes
1.02
bringer
0.88
thereof
0.76
result
0.72
ebin
0.71
probabilities
0.70
Result
0.69
Winner
0.67
winner
0.67
Activations Density 0.008%