INDEX
Explanations
instances of the word "ban" and its variations
New Auto-Interp
Negative Logits
oÄŁ
-0.17
ÛĮ
-0.16
chten
-0.15
oze
-0.15
klass
-0.14
elig
-0.14
ãĤĥ
-0.14
ODO
-0.14
aÄŁ
-0.14
eon
-0.13
POSITIVE LOGITS
ishment
0.32
ished
0.30
offee
0.29
quets
0.28
tering
0.25
ishing
0.25
jo
0.24
ister
0.23
jax
0.23
anas
0.23
Activations Density 0.013%