INDEX
Explanations
verb phrases indicating actions or states of being
New Auto-Interp
Negative Logits
iden
-0.17
uit
-0.15
eph
-0.14
èĢ
-0.14
_lazy
-0.14
arp
-0.13
Deniz
-0.13
RT
-0.13
olin
-0.13
pub
-0.13
POSITIVE LOGITS
be
0.22
rades
0.18
have
0.16
avoir
0.16
having
0.16
been
0.16
.have
0.15
being
0.15
zych
0.14
меÑĤÑĮ
0.14
Activations Density 0.076%