INDEX
Explanations
key improvements and explanations
New Auto-Interp
Negative Logits
целом
0.30
légère
0.28
sparked
0.26
greatness
0.26
روم
0.26
chào
0.26
서는
0.26
вигляді
0.26
सरकारों
0.26
більш
0.25
POSITIVE LOGITS
penting
0.49
important
0.48
tenets
0.48
Important
0.48
important
0.47
importantes
0.46
इंपॉर्टेंट
0.45
Important
0.45
इंपोर्टेंट
0.45
concepts
0.45
Activations Density 0.083%