INDEX
Explanations
how, where, when, which describe actions
New Auto-Interp
Negative Logits
Semua
-1.00
hichever
-0.96
centaje
-0.94
所有
-0.93
všet
-0.91
Yosh
-0.89
delivr
-0.88
всі
-0.88
lewood
-0.87
moncler
-0.87
POSITIVE LOGITS
they
1.55
we
1.23
meals
1.09
goods
0.99
리를
0.98
you
0.93
these
0.92
something
0.91
इसे
0.90
można
0.89
Activations Density 0.133%