INDEX
Explanations
expressions of opposition or resistance to various ideas, proposals, or policies
New Auto-Interp
Negative Logits
neutral
-0.16
ood
-0.16
security
-0.16
ecurity
-0.15
set
-0.15
211
-0.15
asia
-0.15
security
-0.14
γκα
-0.14
erea
-0.14
POSITIVE LOGITS
lico
0.15
Ñĥбли
0.14
.scalablytyped
0.14
craft
0.14
venta
0.14
agedList
0.14
Craft
0.13
angi
0.13
orge
0.13
ernaut
0.13
Activations Density 0.086%