INDEX
Explanations
the conjunctions and question words within a context
New Auto-Interp
Negative Logits
Kaynak
-0.17
ood
-0.17
ı
-0.15
its
-0.15
ully
-0.15
Mayer
-0.14
PG
-0.14
ula
-0.14
dex
-0.14
вов
-0.14
POSITIVE LOGITS
ifs
0.15
央
0.15
agan
0.15
à¥įयत
0.15
rá
0.15
rance
0.15
manner
0.14
rud
0.14
ÚĨÚ¯ÙĪÙĨÙĩ
0.14
adies
0.14
Activations Density 0.020%