INDEX
Explanations
quotations within text
quotation marks and dialogue
New Auto-Interp
Negative Logits
eleph
-0.85
quir
-0.83
avatar
-0.78
occas
-0.77
pumped
-0.77
squid
-0.76
stra
-0.76
plaster
-0.75
subur
-0.75
booked
-0.75
POSITIVE LOGITS
Let
1.59
Tonight
1.58
Look
1.57
If
1.56
Everybody
1.55
Clearly
1.55
I
1.54
Enough
1.54
Nobody
1.54
Absolutely
1.53
Activations Density 0.137%