INDEX
Explanations
words with the letter 'r' in various forms and contexts
New Auto-Interp
Negative Logits
anos
-0.20
mos
-0.16
anders
-0.16
rod
-0.16
orners
-0.15
subst
-0.15
iler
-0.15
zen
-0.15
efe
-0.15
agrams
-0.14
POSITIVE LOGITS
attach
0.24
ê
0.21
oya
0.21
appro
0.20
alent
0.20
ense
0.18
ég
0.18
oi
0.17
attr
0.17
iche
0.17
Activations Density 0.007%