INDEX
Explanations
mentions of legal or political terms related to regulations, human rights, and government activities
New Auto-Interp
Negative Logits
Weston
-0.70
Naples
-0.57
Winc
-0.54
LH
-0.52
Kro
-0.52
ieri
-0.52
KS
-0.52
reprinted
-0.51
NAS
-0.51
Watkins
-0.51
POSITIVE LOGITS
"
1.43
"?
1.39
"—
1.39
%"
1.38
'?
1.32
"'
1.30
"-
1.30
[/
1.29
",
1.28
"!
1.28
Activations Density 4.696%