INDEX
Explanations
expressions of surprise or unexpected outcomes
New Auto-Interp
Negative Logits
cheminée
-0.45
Cabo
-0.44
VYMaps
-0.44
writerow
-0.44
Manus
-0.42
atún
-0.42
lå
-0.42
audrait
-0.41
grasas
-0.41
<code>
-0.41
POSITIVE LOGITS
Surprise
0.80
surprised
0.77
surprise
0.77
surpris
0.72
Delight
0.69
surprise
0.68
surprised
0.68
Surprise
0.68
surprises
0.65
Surprised
0.62
Activations Density 0.408%