INDEX
Explanations
most important thing or advice
New Auto-Interp
Negative Logits
every
0.88
manchmal
0.86
every
0.84
Every
0.82
Sometimes
0.81
आजकल
0.81
Many
0.81
Every
0.80
sometimes
0.79
منٹ
0.78
POSITIVE LOGITS
blev
0.65
действу
0.61
exited
0.61
いましたが
0.61
remained
0.60
Yeah
0.60
underwent
0.58
did
0.58
变的
0.58
ነበር
0.58
Activations Density 0.125%