INDEX
Explanations
occurrences of the letter 'w' and its variations in case
New Auto-Interp
Negative Logits
་་
-0.88
).)
-0.85
).}
-0.83
.";
-0.79
°;
-0.78
).]
-0.78
*}
-0.75
⦁
-0.74
())->
-0.72
')):
-0.70
POSITIVE LOGITS
w
2.14
w
2.13
W
1.03
𝐰
0.98
ww
0.97
W
0.96
iw
0.94
mw
0.91
jw
0.90
𝙬
0.89
Activations Density 0.098%