INDEX
Explanations
foreign languages and punctuation
New Auto-Interp
Negative Logits
c
1.06
ก
1.03
m
0.92
not
0.88
that
0.87
the
0.86
decentral
0.85
મ
0.84
b
0.82
be
0.80
POSITIVE LOGITS
.
1.31
ت
1.20
۔
1.13
․
1.04
고
1.03
تمد
0.98
تهم
0.95
diuretics
0.95
4
0.95
া
0.95
Activations Density 0.993%