INDEX
Explanations
references to social media and online interactions
New Auto-Interp
Negative Logits
urlencoded
-0.52
charset
-0.51
($__
-0.51
Berber
-0.49
للاسماء
-0.47
قایناقلار
-0.47
GeneratedCode
-0.46
ुन
-0.46
csrf
-0.46
svr
-0.45
POSITIVE LOGITS
1.20
social
1.16
1.15
1.12
1.02
0.98
0.98
sociala
0.96
0.95
0.93
Activations Density 0.250%