INDEX
Explanations
expressions of surprise or unexpected outcomes
New Auto-Interp
Negative Logits
cshtml
-0.54
reconoci
-0.54
vœux
-0.51
норма
-0.49
vergleichen
-0.49
peka
-0.49
Blah
-0.49
enabling
-0.49
少了
-0.48
ReadAll
-0.48
POSITIVE LOGITS
surprise
2.24
unexpected
2.11
surprises
1.97
surprising
1.88
Surprise
1.84
Unexpected
1.84
surprise
1.82
Surprise
1.77
unexpected
1.75
Unexpected
1.72
Activations Density 0.087%