INDEX
Explanations
conversations and interactions between characters in a text
expressions of humor and light-hearted dialogue
New Auto-Interp
Negative Logits
NCT
-0.71
batters
-0.62
é¾įå
-0.62
Semin
-0.60
hower
-0.60
Stat
-0.60
predominantly
-0.59
ordes
-0.59
uably
-0.59
Mand
-0.59
POSITIVE LOGITS
-"
1.45
â̦"
1.33
..."
1.27
—"
1.15
â̦"
1.15
!?"
1.07
â̦."
1.06
?"
1.06
!"
0.99
?!"
0.98
Activations Density 0.424%