INDEX
Explanations
code and programming comments
New Auto-Interp
Negative Logits
CH
0.84
دون
0.81
Е
0.78
I
0.77
ד
0.73
드를
0.72
دت
0.72
م
0.72
ding
0.71
dB
0.69
POSITIVE LOGITS
u
1.11
ul
0.92
и
0.92
ير
0.91
_
0.91
is
0.88
were
0.87
ze
0.87
on
0.86
zed
0.84
Activations Density 0.018%