INDEX
Explanations
words related to surprise or unexpected outcomes
New Auto-Interp
Negative Logits
ArrowToggle
-0.72
Autoritní
-0.64
weird
-0.63
Nero
-0.63
bizarre
-0.62
.*")]
-0.62
Controllo
-0.62
crazy
-0.61
weird
-0.60
Kat
-0.60
POSITIVE LOGITS
Surprise
0.88
***************/
0.81
Surprise
0.81
surprise
0.74
surprises
0.72
surprise
0.69
prises
0.68
Suf
0.68
AddTagHelper
0.68
XCTest
0.68
Activations Density 0.010%