INDEX
Explanations
statements or proposals related to environmental or health policies
New Auto-Interp
Negative Logits
eva
-0.14
دارÛĮ
-0.14
èµ·
-0.14
ÏĦή
-0.14
ptest
-0.14
Forbidden
-0.13
женÑĮ
-0.13
svc
-0.13
irim
-0.13
andest
-0.13
POSITIVE LOGITS
unless
0.31
Unless
0.26
unless
0.23
Unless
0.23
society
0.22
there
0.21
we
0.20
if
0.20
solutions
0.20
without
0.20
Activations Density 0.353%