INDEX
Explanations
references to the reader or audience in a conversational context
New Auto-Interp
Negative Logits
Excellence
-0.69
votes
-0.64
airs
-0.63
Innocent
-0.62
ween
-0.61
adium
-0.61
Political
-0.60
icy
-0.59
ensable
-0.59
Scarborough
-0.57
POSITIVE LOGITS
're
1.43
'll
1.36
've
1.21
guessed
1.20
can
1.06
'd
1.01
guys
0.99
know
0.99
mileage
0.99
tub
0.98
Activations Density 0.120%