INDEX
Explanations
vulgar or offensive language and terms
New Auto-Interp
Negative Logits
HCR
-0.73
contracted
-0.68
distingu
-0.66
âĢ¢âĢ¢âĢ¢âĢ¢
-0.66
forfeiture
-0.64
åĬ
-0.63
Dialogue
-0.63
APH
-0.63
AUT
-0.63
livest
-0.62
POSITIVE LOGITS
glers
1.19
bags
1.14
bag
1.07
boy
1.07
holes
1.03
heads
1.02
boys
1.00
ery
0.99
tails
0.94
hole
0.94
Activations Density 3.904%