INDEX
Explanations
discussions about harmful statements or language that imply threats or violence
New Auto-Interp
Negative Logits
tartalomajánló
-0.42
queryInterface
-0.41
Autoritní
-0.36
ViewInit
-0.36
overzicht
-0.35
виправивши
-0.35
scales
-0.34
stories
-0.34
Previews
-0.34
HtmlAttribute
-0.34
POSITIVE LOGITS
uttered
1.55
uttering
1.13
utterance
1.09
spoken
1.07
uttered
1.01
pronunci
1.00
muttered
1.00
spoken
0.96
Spoken
0.92
Spoken
0.92
Activations Density 0.605%