INDEX
Explanations
phrases that emphasize frequency or degree of experience
New Auto-Interp
Negative Logits
гораздо
-0.48
удобно
-0.47
arakat
-0.45
hơn
-0.44
besser
-0.43
nzuri
-0.42
Efq
-0.42
verty
-0.41
ERE
-0.41
Arce
-0.41
POSITIVE LOGITS
fier
0.80
prou
0.78
dir
0.77
classi
0.72
cra
0.72
flas
0.71
humb
0.71
nas
0.71
migh
0.69
HAP
0.67
Activations Density 0.389%