INDEX
Explanations
phrases related to legal protections or rights
references to legal and social protections
New Auto-Interp
Negative Logits
istg
-0.65
oca
-0.61
iche
-0.60
NAS
-0.59
ravis
-0.59
bus
-0.58
pulp
-0.58
tires
-0.58
Sus
-0.57
Stain
-0.57
POSITIVE LOGITS
protections
1.10
enshr
1.06
afforded
0.93
safeguards
0.86
Rights
0.77
guaranteeing
0.77
protect
0.73
protecting
0.71
Ambro
0.71
rights
0.70
Activations Density 0.034%