INDEX
Explanations
references to fairness, justice, and equality
New Auto-Interp
Negative Logits
PROV
-0.81
aneous
-0.70
arij
-0.66
Norn
-0.65
ATED
-0.65
Assembly
-0.63
pta
-0.61
ulous
-0.61
PE
-0.58
odied
-0.58
POSITIVE LOGITS
Weasley
0.93
ships
0.88
fort
0.85
iton
0.84
lyn
0.84
nell
0.83
Ñĭ
0.80
itudes
0.79
ies
0.79
\\\\\\\\
0.79
Activations Density 3.335%