INDEX
Explanations
phrases that convey actions or events in various contexts
New Auto-Interp
Negative Logits
æīį
-0.18
ushman
-0.15
Jessica
-0.15
æīįèĥ½
-0.15
گاÙĨ
-0.14
885
-0.14
dual
-0.14
Mall
-0.14
wides
-0.14
زÙĦ
-0.14
POSITIVE LOGITS
trata
0.44
tratt
0.39
trat
0.33
tr
0.29
tr
0.29
dealing
0.28
-tr
0.28
Tr
0.28
-Tr
0.27
.tr
0.26
Activations Density 0.024%