INDEX
Explanations
content that contains offensive language or illegal material
offensive or hateful content
New Auto-Interp
Negative Logits
Dio
-0.45
Sino
-0.44
pk
-0.44
IBOutlet
-0.43
program
-0.43
IFF
-0.43
Ud
-0.43
micro
-0.42
NIS
-0.42
back
-0.42
POSITIVE LOGITS
ainfi
0.53
verwijspagina
0.52
guiente
0.52
enfermed
0.52
Jurí
0.51
humanidade
0.48
Infór
0.47
Púb
0.47
thérape
0.47
ErrIntOverflow
0.47
Activations Density 0.110%