INDEX
Explanations
the repetition of the letter 'w' in various forms
New Auto-Interp
Negative Logits
empt
-0.17
hu
-0.16
hang
-0.16
nÃŃ
-0.15
lob
-0.15
¯ÃĤ
-0.15
ع
-0.14
oped
-0.14
exc
-0.14
lets
-0.14
POSITIVE LOGITS
irtschaft
0.21
hat
0.20
bsite
0.20
issenschaft
0.20
inder
0.18
anj
0.18
icz
0.18
istar
0.18
affle
0.17
avy
0.17
Activations Density 0.162%