INDEX
Explanations
numbers after code constructs
New Auto-Interp
Negative Logits
2
0.64
4
0.63
1
0.62
3
0.61
7
0.59
6
0.59
5
0.57
9
0.52
8
0.51
the
0.49
POSITIVE LOGITS
puede
0.47
er
0.46
zelfde
0.45
இருக்கலாம்
0.45
P
0.45
in
0.43
ﺹ
0.43
rugated
0.42
$\|
0.42
כן
0.42
Activations Density 0.453%