INDEX
Explanations
quotations within texts
dialogue or quotes in the text
New Auto-Interp
Negative Logits
describ
-0.80
looph
-0.79
rul
-0.78
favour
-0.78
edged
-0.78
moder
-0.77
vegetarian
-0.76
spill
-0.75
¥ŀ
-0.75
discont
-0.74
POSITIVE LOGITS
Everybody
1.87
We
1.83
They
1.81
Honestly
1.81
It
1.78
Obviously
1.76
Absolutely
1.75
I
1.74
Yeah
1.74
Everything
1.72
Activations Density 0.153%