INDEX
Explanations
recalling details, context, or facts
New Auto-Interp
Negative Logits
\
1.27
ف
0.96
'
0.93
ج
0.91
ive
0.79
amplio
0.79
anisotrop
0.78
?
0.77
à
0.76
ä
0.76
POSITIVE LOGITS
d
1.10
ר
1.05
ur
1.01
h
1.01
ת
1.01
м
1.00
z
0.97
ర్
0.96
AK
0.96
ే
0.96
Activations Density 0.062%