INDEX
Explanations
phrases related to emphasizing the importance or rationale behind a particular action or decision
references to concepts of value and purpose
New Auto-Interp
Negative Logits
Cosponsors
-1.19
DragonMagazine
-0.82
falls
-0.77
til
-0.73
aults
-0.69
venants
-0.67
ILCS
-0.67
glass
-0.66
offer
-0.66
hook
-0.66
POSITIVE LOGITS
preserving
1.10
fairness
1.05
integrity
0.97
protecting
0.95
decency
0.92
confidentiality
0.91
justice
0.91
efficiency
0.90
simplicity
0.90
purity
0.89
Activations Density 0.134%