INDEX
Explanations
references to specific individuals tied to legal or judicial contexts
New Auto-Interp
Negative Logits
er
-0.23
erland
-0.18
y
-0.17
hong
-0.16
raph
-0.16
larger
-0.15
herent
-0.15
oops
-0.15
eru
-0.15
rote
-0.14
POSITIVE LOGITS
ainties
0.22
unately
0.20
ech
0.19
urb
0.19
brates
0.18
ificates
0.17
uche
0.17
ti
0.17
ebra
0.17
icipants
0.17
Activations Density 0.010%