INDEX
Explanations
phrases indicating quantifiable measures or evaluations
New Auto-Interp
Negative Logits
apers
-0.17
ahi
-0.17
.tw
-0.15
pard
-0.15
mans
-0.15
Madden
-0.14
raud
-0.14
Bacon
-0.14
تÙģ
-0.14
Ĥ
-0.14
POSITIVE LOGITS
åΰ
0.21
kepada
0.21
unto
0.20
eer
0.20
to
0.19
ToSelector
0.18
ToBounds
0.18
åΰäºĨ
0.17
اÙĦÙī
0.17
bersome
0.17
Activations Density 0.012%