INDEX
Explanations
expressions of irony and surprise
New Auto-Interp
Negative Logits
NOPQRST
-0.53
meriva
-0.52
scania
-0.52
сюда
-0.51
Merlin
-0.51
Manus
-0.50
aerop
-0.48
hydra
-0.48
behövs
-0.48
zegorz
-0.47
POSITIVE LOGITS
Surprisingly
0.71
竟
0.70
surprisingly
0.70
xically
0.69
ironically
0.68
居然
0.67
Ironically
0.67
estimés
0.66
prisingly
0.63
竟然
0.62
Activations Density 0.260%