INDEX
Explanations
phrases containing people's names
instances of dialogue or direct speech
New Auto-Interp
Negative Logits
interchange
-0.68
exha
-0.65
fueling
-0.65
interstitial
-0.65
economies
-0.63
outcomes
-0.63
encount
-0.62
departures
-0.62
obser
-0.61
remaining
-0.61
POSITIVE LOGITS
huh
1.03
Gentleman
0.88
hov
0.87
thank
0.81
please
0.81
oh
0.78
yeah
0.75
beware
0.75
Disciple
0.74
usalem
0.73
Activations Density 0.360%