INDEX
Explanations
numerical values and arithmetic operations
New Auto-Interp
Negative Logits
İstinadlar
-0.96
rungsseite
-0.90
ſelf
-0.84
contentLoaded
-0.84
ainfi
-0.84
"):
-0.82
-0.81
―――――
-0.81
برانيه
-0.79
iſt
-0.79
POSITIVE LOGITS
0.60
inter
0.54
-
0.53
.
0.53
↵↵
0.52
<eos>
0.50
(
0.49
=
0.47
entre
0.46
Inter
0.46
Activations Density 0.076%