INDEX
Explanations
the letter 'w' in various contexts
New Auto-Interp
Negative Logits
eki
-0.17
iversal
-0.17
ież
-0.17
ibold
-0.16
Coder
-0.16
utorial
-0.16
hci
-0.16
isters
-0.15
panies
-0.14
likler
-0.14
POSITIVE LOGITS
nik
0.16
aza
0.16
edException
0.14
rag
0.14
aver
0.14
MAN
0.13
лиз
0.13
ÛĮÙĩ
0.13
agt
0.13
naked
0.13
Activations Density 0.020%