INDEX
Explanations
capacity and size measurements
New Auto-Interp
Negative Logits
۔
0.51
iz
0.50
ر
0.49
lik
0.47
jší
0.46
র
0.45
или
0.45
یک
0.45
iliş
0.44
інші
0.44
POSITIVE LOGITS
at
0.73
on
0.61
지
0.56
ad
0.54
em
0.54
中
0.50
ak
0.50
as
0.49
ной
0.49
in
0.49
Activations Density 0.000%