INDEX
Explanations
exclamations and interjections
expressions of surprise or disbelief
New Auto-Interp
Negative Logits
abb
-0.68
orate
-0.67
copy
-0.65
edge
-0.64
ictions
-0.64
Central
-0.61
itations
-0.61
icip
-0.60
Hispanic
-0.59
atics
-0.59
POSITIVE LOGITS
Oh
3.31
oh
2.15
Oh
1.98
Ah
1.66
Huh
1.65
Hey
1.65
Uh
1.57
Yeah
1.49
Wow
1.46
Okay
1.44
Activations Density 0.010%