INDEX
Explanations
exclamations or expressions of emotion
expressions of congratulations and reassurance
New Auto-Interp
Negative Logits
artney
-0.82
cious
-0.72
itialized
-0.68
ritic
-0.66
pend
-0.66
inosaur
-0.62
vey
-0.61
eatured
-0.61
dinand
-0.59
fund
-0.59
POSITIVE LOGITS
!
1.07
!,
1.05
!:
1.00
!.
0.98
!]
0.91
!),
0.87
!).
0.87
!!
0.87
!)
0.85
!'
0.84
Activations Density 0.149%