INDEX
Explanations
words related to exclamations or emphatic expressions
expressions of excitement or emphasis
New Auto-Interp
Negative Logits
apt
-0.71
maj
-0.67
pessim
-0.65
raph
-0.65
76
-0.63
nec
-0.62
Aur
-0.61
aer
-0.61
curs
-0.60
mull
-0.60
POSITIVE LOGITS
!'
1.24
!.
1.16
!,
1.13
!'"
1.09
!:
1.08
!
1.05
!/
1.01
!]
0.98
!?
0.98
!".
0.95
Activations Density 0.211%