INDEX
Explanations
indicating increasing levels
New Auto-Interp
Negative Logits
simpler
0.51
डायरेक्टली
0.49
легче
0.47
easier
0.46
directly
0.45
smaller
0.45
சாதாரண
0.45
directly
0.44
更容易
0.44
endangering
0.44
POSITIVE LOGITS
একটু
0.56
elaboration
0.55
lengkap
0.52
embellished
0.52
elaborate
0.49
élabor
0.49
немного
0.48
விளக்க
0.47
elaborado
0.45
completeness
0.45
Activations Density 0.004%