INDEX
Explanations
words related to certain specific numerical values or measurements
New Auto-Interp
Negative Logits
atal
-0.17
erv
-0.16
ep
-0.15
ugo
-0.15
awn
-0.15
aram
-0.15
okol
-0.15
ivery
-0.15
psc
-0.15
endoza
-0.15
POSITIVE LOGITS
еждÑĥ
0.27
ног
0.26
ного
0.25
ax
0.23
нение
0.23
нениÑı
0.23
олод
0.22
нож
0.22
лад
0.21
ÑĥзÑĭ
0.20
Activations Density 0.009%