INDEX
Explanations
instances of the letter 'u' in various contexts
New Auto-Interp
Negative Logits
Theſe
-0.83
dring
-0.74
Anſ
-0.73
―――――
-0.72
truff
-0.72
corrid
-0.72
་་
-0.71
Beſ
-0.71
Camin
-0.69
الشر
-0.69
POSITIVE LOGITS
u
1.88
U
1.67
U
1.43
u
1.25
uve
1.02
v
1.00
V
0.98
u
0.93
l
0.93
r
0.91
Activations Density 0.098%