INDEX
Explanations
words related to legal advice or legal contexts
New Auto-Interp
Negative Logits
p
-0.18
pent
-0.16
ags
-0.15
836
-0.15
rix
-0.15
repro
-0.15
Gh
-0.14
neo
-0.14
pic
-0.14
rong
-0.14
POSITIVE LOGITS
ffer
0.20
tracted
0.19
emer
0.19
actively
0.18
owler
0.18
verbs
0.17
sth
0.17
forma
0.17
bon
0.17
pped
0.17
Activations Density 0.032%