INDEX
Explanations
references to institutional frameworks and regulations affecting individual rights and responsibilities
New Auto-Interp
Negative Logits
â̦↵↵
-0.19
Âł
-0.19
â̦↵
-0.17
:↵
-0.17

-0.15
”
-0.15
:↵↵
-0.15
â̦
-0.15
â̦.
-0.14
”
-0.13
POSITIVE LOGITS
\↵
0.49
\↵
0.48
,\↵
0.36
"\↵
0.29
"\↵
0.29
ãĢģ↵
0.28
"+↵
0.28
"+↵
0.27
\č↵
0.27
(↵
0.26
Activations Density 8.944%