INDEX
Explanations
learning and skills acquisition
New Auto-Interp
Negative Logits
indicating
0.40
<bos>
0.40
correctly
0.40
पतवार
0.37
opting
0.36
лод
0.36
滥
0.34
refuses
0.34
suggesting
0.34
Oscillator
0.34
POSITIVE LOGITS
learn
1.68
learning
1.61
aprender
1.60
aprend
1.56
aprendizaje
1.53
learn
1.51
learns
1.51
aprende
1.48
學習
1.47
学习
1.45
Activations Density 0.021%