INDEX
Explanations
legal and policy-related terms and concepts
New Auto-Interp
Negative Logits
borough
-0.80
bard
-0.78
bage
-0.72
xon
-0.71
ko
-0.70
kind
-0.69
kaya
-0.69
boy
-0.68
ascus
-0.68
been
-0.65
POSITIVE LOGITS
us
1.01
unrestricted
0.85
users
0.83
withdrawals
0.80
experimentation
0.79
Reviewer
0.79
access
0.78
rapists
0.78
passers
0.76
me
0.76
Activations Density 0.575%