INDEX
Explanations
occurrences of the letter "W" and related patterns
New Auto-Interp
Negative Logits
erva
-0.17
ãĥªãĥ¼ãĤº
-0.15
èģ
-0.15
hare
-0.15
óc
-0.15
rang
-0.15
äter
-0.15
endale
-0.15
lesen
-0.15
loe
-0.15
POSITIVE LOGITS
ai
0.20
wise
0.17
WISE
0.17
è
0.16
fund
0.16
McGu
0.16
conform
0.15
ollo
0.15
amp
0.15
ind
0.15
Activations Density 0.015%