INDEX
Explanations
instances of the substring "tr" in various forms
New Auto-Interp
Negative Logits
tr
-0.18
tring
-0.16
edly
-0.16
HEMA
-0.15
arkin
-0.15
callable
-0.15
igne
-0.14
Appropri
-0.14
esp
-0.14
comm
-0.14
POSITIVE LOGITS
uck
0.25
unk
0.21
ail
0.20
acked
0.20
actor
0.19
ails
0.18
ains
0.18
ucks
0.18
ough
0.17
ims
0.17
Activations Density 0.020%