INDEX
Explanations
word patterns that include "att" or "ott" at various positions
New Auto-Interp
Negative Logits
er
-0.19
ãģįãģŁ
-0.19
ove
-0.18
pen
-0.18
rics
-0.18
ook
-0.17
ric
-0.17
oho
-0.16
rus
-0.16
ru
-0.16
POSITIVE LOGITS
sville
0.24
orney
0.24
sburgh
0.22
anooga
0.22
orneys
0.22
t
0.20
ahoo
0.20
les
0.20
endor
0.20
erson
0.19
Activations Density 0.031%