INDEX
Explanations
negations or limitations regarding actions or qualities
New Auto-Interp
Negative Logits
igli
-0.15
ares
-0.15
Vas
-0.14
ili
-0.14
ali
-0.14
chine
-0.13
ombo
-0.13
onga
-0.13
ign
-0.13
itta
-0.13
POSITIVE LOGITS
newly
0.24
recently
0.21
recent
0.19
Newly
0.18
currently
0.18
Recently
0.18
henüz
0.17
å·²
0.17
currently
0.17
novÄĽ
0.17
Activations Density 0.021%