INDEX
Explanations
simple project or role play
New Auto-Interp
Negative Logits
prépuce
0.44
ленным
0.43
сада
0.43
Firm
0.43
ября
0.41
dissati
0.41
Цу
0.41
덟
0.41
수한
0.41
долла
0.40
POSITIVE LOGITS
right
0.55
مل
0.48
all
0.46
سر
0.46
fast
0.45
ranged
0.45
wind
0.44
which
0.44
Law
0.44
confirmed
0.43
Activations Density 0.255%