INDEX
Explanations
dates with ordinal suffixes
New Auto-Interp
Negative Logits
=
1.02
ה
0.94
ו
0.93
0.85
a
0.84
(
0.82
-
0.79
;
0.79
ות
0.78
↵
0.77
POSITIVE LOGITS
in
1.09
rá
0.82
nh
0.74
spiele
0.73
inação
0.71
íny
0.70
rón
0.69
栤
0.66
ಿರುವ
0.65
菄
0.65
Activations Density 0.005%