INDEX
Explanations
reaching zero or specific goals
New Auto-Interp
Negative Logits
almost
0.61
quase
0.58
Almost
0.58
Almost
0.57
almost
0.55
minus
0.55
prawie
0.53
minus
0.50
Minus
0.50
hampir
0.49
POSITIVE LOGITS
깔
0.41
будете
0.40
Ꮭ
0.40
缄
0.39
िएगा
0.39
揮
0.38
closure
0.38
بمعنى
0.38
абсолю
0.37
犸
0.37
Activations Density 0.000%