INDEX
Explanations
mentions of the term "Nazi" and its variations related to extremist ideologies
New Auto-Interp
Negative Logits
andon
-0.16
quil
-0.16
aiser
-0.14
InternalServerError
-0.14
adge
-0.14
.brand
-0.14
nia
-0.14
itzer
-0.14
ovah
-0.13
365
-0.13
POSITIVE LOGITS
areth
0.36
ional
0.26
ionale
0.25
ionales
0.20
urally
0.18
arius
0.17
DAQ
0.17
ario
0.16
daq
0.16
arov
0.16
Activations Density 0.006%