INDEX
Explanations
references to organizations and their actions
New Auto-Interp
Negative Logits
فريبيس
-0.80
batis
-0.64
cillor
-0.64
XtraGrid
-0.62
BoxFit
-0.60
=\""
-0.60
Diweddarwch
-0.60
躇
-0.59
pinulongan
-0.59
تانيه
-0.58
POSITIVE LOGITS
AntiForgeryToken
0.66
Handlung
0.65
룸
0.61
recently
0.52
<bos>
0.52
.
0.51
[toxicity=0]
0.51
sibi
0.50
broadly
0.50
,
0.49
Activations Density 0.251%