INDEX
Explanations
occurrences of the letter "w"
New Auto-Interp
Negative Logits
Theſe
-1.05
་་
-1.02
Monfieur
-0.97
―――――
-0.95
iſt
-0.95
myſelf
-0.91
Beſ
-0.90
themſelves
-0.89
ſeveral
-0.86
verſ
-0.84
POSITIVE LOGITS
w
1.99
W
1.90
W
1.70
w
1.62
b
1.11
d
0.95
h
0.94
r
0.93
𝙬
0.93
g
0.93
Activations Density 0.088%