INDEX
Explanations
derogatory or offensive language
New Auto-Interp
Negative Logits
ArgsConstructor
-0.58
modernize
-0.55
VIAF
-0.55
ویکیپدیای
-0.54
Ders
-0.53
原始内容存档于
-0.52
PROBE
-0.51
RTGC
-0.51
بوابة
-0.51
Tall
-0.49
POSITIVE LOGITS
shit
1.97
crap
1.72
SHIT
1.71
Shit
1.68
shit
1.66
Shit
1.61
shite
1.45
shits
1.45
crap
1.31
Crap
1.25
Activations Density 0.381%