INDEX
Explanations
phrases indicating involvement or participation in actions or events
New Auto-Interp
Negative Logits
urat
-0.17
ica
-0.15
_apply
-0.15
770
-0.14
olia
-0.14
anje
-0.14
Dre
-0.14
opo
-0.14
<+
-0.14
arn
-0.14
POSITIVE LOGITS
by
0.23
bợi
0.20
oleh
0.19
lung
0.16
تÙĪØ³Ø·
0.15
zes
0.14
*)_
0.14
rung
0.14
MPI
0.14
layer
0.14
Activations Density 0.337%