INDEX
Explanations
ethnic and cultural references
references to groups related to terrorism or extremist organizations
New Auto-Interp
Negative Logits
priv
-0.61
Speedway
-0.60
Malf
-0.59
ãĥ¼ãĥĨ
-0.58
Dise
-0.57
Fn
-0.57
Fol
-0.56
govtrack
-0.56
faculties
-0.55
inval
-0.55
POSITIVE LOGITS
yz
0.77
ovsky
0.74
nik
0.68
anski
0.66
ofer
0.65
amed
0.65
iew
0.65
osate
0.64
orb
0.64
oz
0.64
Activations Density 0.340%