INDEX
Explanations
profane and aggressive language
profanity and derogatory terms
New Auto-Interp
Negative Logits
striking
-0.77
Annotations
-0.71
Canaver
-0.71
comprehens
-0.67
Emanuel
-0.67
Ambro
-0.67
prolonged
-0.66
departures
-0.65
descending
-0.64
statically
-0.64
POSITIVE LOGITS
****
1.25
ookie
1.23
*****
1.18
aaaa
1.18
itty
1.15
anky
1.11
oooo
1.11
aaa
1.10
fuck
1.08
$$
1.08
Activations Density 0.279%