INDEX
Explanations
references to legal, crime, and political terms
phrases related to interrogation or questionable ethical practices
New Auto-Interp
Negative Logits
ãĤ´ãĥ³
-0.59
):
-0.46
ãĥ¯ãĥ³
-0.46
"#
-0.45
é¾
-0.45
NFC
-0.43
©¶æ¥µ
-0.43
"$
-0.43
ËĪ
-0.42
Riders
-0.42
POSITIVE LOGITS
..."
1.70
.")
1.62
â̦"
1.57
)"
1.47
)."
1.46
%"
1.45
,'"
1.42
)",
1.41
..."
1.40
."[
1.39
Activations Density 3.139%