INDEX
Explanations
words related to law enforcement actions and legal terminology
New Auto-Interp
Negative Logits
lihood
-0.72
BSD
-0.71
intendent
-0.66
beware
-0.64
manship
-0.63
terday
-0.63
bold
-0.62
taunt
-0.62
tom
-0.62
nown
-0.62
POSITIVE LOGITS
achable
1.24
ention
1.11
rans
1.10
ainer
1.09
roit
1.06
ailed
1.06
uned
1.02
ainers
0.98
rit
0.98
rag
0.98
Activations Density 0.029%