INDEX
Explanations
instances of the word "cla" followed by other letters, indicating a focus on words starting with that sequence
New Auto-Interp
Negative Logits
ruba
-0.17
il
-0.17
ban
-0.16
atee
-0.16
iliz
-0.15
frey
-0.15
aging
-0.14
ÛĮÙĦÛĮ
-0.14
è¶³
-0.14
banquet
-0.14
POSITIVE LOGITS
esson
0.29
ussen
0.28
assen
0.26
udio
0.22
ire
0.22
ude
0.22
ifornia
0.20
IRE
0.20
essen
0.19
ques
0.19
Activations Density 0.005%