INDEX
Explanations
references to judges
references to judges
New Auto-Interp
Negative Logits
usp
-0.81
uld
-0.74
ijk
-0.72
ulates
-0.71
eworld
-0.71
irst
-0.67
haar
-0.65
ulated
-0.64
iliar
-0.63
td
-0.62
POSITIVE LOGITS
Judge
0.98
Judge
0.94
Advocate
0.89
Gorsuch
0.84
jud
0.83
istrate
0.82
Judy
0.77
orneys
0.76
yers
0.76
rulings
0.75
Activations Density 0.005%