INDEX
Explanations
we you they give tell provide
New Auto-Interp
Negative Logits
Was
2.11
Was
2.04
was
1.99
was
1.88
быть
1.82
wasnt
1.70
WAS
1.69
Wasn
1.64
был
1.63
wasn
1.62
POSITIVE LOGITS
vengono
1.99
میشوند
1.72
đều
1.70
помогают
1.59
fazem
1.59
начинают
1.56
जातात
1.52
ficam
1.52
doivent
1.52
are
1.50
Activations Density 0.345%