INDEX
Explanations
phrases containing speech-related words or direct quotes
instances of expressive or emotionally charged dialogue
New Auto-Interp
Negative Logits
artifacts
-0.89
issance
-0.84
olicy
-0.82
emis
-0.79
profits
-0.76
ropolis
-0.76
cair
-0.75
hesda
-0.75
etheless
-0.74
dates
-0.74
POSITIVE LOGITS
whisper
1.17
understatement
1.17
voice
1.16
loud
1.14
plaint
1.12
tone
1.12
tones
1.10
sarcastic
1.09
chorus
1.06
tongue
1.05
Activations Density 0.173%