INDEX
Explanations
explanation or specifications
New Auto-Interp
Negative Logits
stets
0.30
svet
0.27
стреми
0.27
pleasures
0.26
contentment
0.26
gratification
0.25
lutte
0.25
چنین
0.25
stillness
0.25
muros
0.25
POSITIVE LOGITS
actually
0.33
其他的
0.28
później
0.28
referenced
0.28
referenced
0.27
显示
0.26
totiž
0.25
uradaki
0.25
indirectly
0.25
called
0.25
Activations Density 0.000%