INDEX
Explanations
detecting fake or hateful content
New Auto-Interp
Negative Logits
ar
1.55
ez
1.30
es
1.22
ため
1.16
ea
1.15
er
1.14
as
1.13
acoli
1.10
lighting
1.08
iéndose
1.07
POSITIVE LOGITS
ओम
1.30
structs
1.29
straining
1.29
cherish
1.25
cues
1.24
inception
1.22
onset
1.20
trouble
1.19
undercut
1.18
adhering
1.16
Activations Density 0.001%