INDEX
Explanations
language related to expressions of hate or derogatory comments directed at individuals or groups
New Auto-Interp
Negative Logits
AppCompat
-0.57
stories
-0.52
文章
-0.51
document
-0.51
note
-0.50
detal
-0.50
detail
-0.50
FileDescriptor
-0.50
chronicles
-0.49
وض
-0.47
POSITIVE LOGITS
uttered
0.92
uttering
0.75
utterances
0.71
glGen
0.70
makeConstraints
0.69
ConstraintMaker
0.68
utterance
0.67
DispatchToProps
0.66
فاده
0.66
GenerationType
0.66
Activations Density 0.098%