INDEX
Explanations
words related to confusion or complexity
New Auto-Interp
Negative Logits
le
-0.66
lej
-0.32
lea
-0.28
Leod
-0.27
lek
-0.27
leitung
-0.24
er
-0.22
lein
-0.22
lei
-0.22
leo
-0.19
POSITIVE LOGITS
lesh
0.31
led
0.27
LES
0.27
legate
0.26
lescope
0.25
les
0.25
ling
0.25
ler
0.25
lename
0.24
leground
0.24
Activations Density 0.059%