INDEX
Explanations
sentences ending with a period
dialogue and quotations in the text
New Auto-Interp
Negative Logits
unsus
-0.81
autonom
-0.79
privat
-0.79
holdings
-0.78
authorised
-0.78
coerc
-0.77
predec
-0.77
nude
-0.77
explicitly
-0.76
censored
-0.76
POSITIVE LOGITS
Hopefully
1.45
Obviously
1.38
Guys
1.35
Hopefully
1.29
Obviously
1.28
Coach
1.25
Whoever
1.19
Everybody
1.17
Playing
1.17
Everybody
1.16
Activations Density 0.207%