INDEX
Explanations
phrases describing the actions and characteristics of certain groups or entities
New Auto-Interp
Negative Logits
itself
-0.37
its
-0.24
Its
-0.19
Its
-0.18
eme
-0.16
à¤īसà¤ķ
-0.16
ara
-0.16
åıĬåħ¶
-0.16
kendini
-0.15
ÙĨÙ쨳Ùĩ
-0.15
POSITIVE LOGITS
themselves
0.34
selves
0.20
/we
0.18
ÅĽmy
0.17
respectively
0.17
lượt
0.17
re
0.17
’re
0.16
umber
0.15
atical
0.15
Activations Density 0.377%