INDEX
Explanations
occurrences of surprising or unexpected events and outcomes in competitive contexts
unexpected outcomes
New Auto-Interp
Negative Logits
roda
-0.44
slaves
-0.42
WHE
-0.40
Slave
-0.38
Rein
-0.37
Boot
-0.37
water
-0.37
Cumming
-0.36
manne
-0.36
Shiro
-0.36
POSITIVE LOGITS
underdog
0.61
Überras
0.60
IsMutable
0.58
überras
0.58
StructEnd
0.57
featureID
0.55
surpresa
0.54
unexpectedly
0.54
richTextPanel
0.54
تقاوى
0.52
Activations Density 0.012%