INDEX
Explanations
words and phrases expressing surprise
New Auto-Interp
Negative Logits
yakarta
-0.59
toJSONString
-0.56
ędzy
-0.52
TacToe
-0.51
styrelsen
-0.50
setViewportView
-0.48
erdings
-0.48
benzina
-0.48
brengen
-0.47
erobe
-0.47
POSITIVE LOGITS
surprise
2.25
Surprise
2.05
surprises
2.03
surprised
2.00
Surprise
1.92
surprise
1.89
surprising
1.72
overras
1.71
surprised
1.70
Überras
1.69
Activations Density 0.151%