INDEX
Explanations
positive change or improvement
New Auto-Interp
Negative Logits
volving
0.38
ספר
0.37
itabbo
0.36
lhe
0.35
Ф
0.35
Classification
0.35
designations
0.34
িনী
0.33
указыва
0.33
ូប
0.33
POSITIVE LOGITS
mejorar
0.79
improve
0.77
mejora
0.76
melhorar
0.72
migliorare
0.72
improves
0.70
cải
0.70
amélior
0.68
améliorer
0.68
улуч
0.67
Activations Density 0.115%