INDEX
Explanations
laws, policies, or measures put in place to ensure the safety or rights of certain groups or individuals
references to various types of protections and safeguards
New Auto-Interp
Negative Logits
ergy
-0.70
Leader
-0.63
ingly
-0.63
NAS
-0.61
parts
-0.61
istg
-0.61
Asia
-0.60
confessions
-0.60
ople
-0.56
Else
-0.56
POSITIVE LOGITS
mith
0.99
protections
0.97
imposed
0.97
poons
0.91
hift
0.91
governing
0.90
safeguards
0.86
enshr
0.85
heet
0.84
cape
0.83
Activations Density 0.117%