INDEX
Explanations
numerical data and statistics in a variety of contexts
New Auto-Interp
Negative Logits
zin
-0.17
iu
-0.15
deck
-0.14
adele
-0.14
iphy
-0.14
stein
-0.14
aling
-0.14
horn
-0.14
.React
-0.14
Yıl
-0.14
POSITIVE LOGITS
overall
0.39
overall
0.35
Overall
0.31
Overall
0.31
altogether
0.29
total
0.24
alto
0.21
ÏĥÏħνο
0.20
вообÑīе
0.19
total
0.19
Activations Density 0.070%