INDEX
Explanations
abbreviated units and codes
New Auto-Interp
Negative Logits
l
0.95
a
0.93
I
0.83
A
0.67
A
0.66
B
0.63
。
0.61
J
0.58
।
0.58
D
0.57
POSITIVE LOGITS
на
1.14
ش
0.92
on
0.89
د
0.83
ح
0.78
ர
0.77
ص
0.77
و
0.76
ج
0.75
મ
0.75
Activations Density 0.439%