INDEX
Explanations
numerical tokens, especially around punctuation
New Auto-Interp
Negative Logits
longer
0.33
tearing
0.30
willingly
0.28
complicated
0.27
rear
0.27
newly
0.27
each
0.27
mistakes
0.27
older
0.26
}).
0.26
POSITIVE LOGITS
ибо
0.39
如果你
0.35
informática
0.34
क्रिप्ट
0.33
idk
0.33
ordeaux
0.32
астро
0.32
psicológica
0.32
interessante
0.32
식품
0.32
Activations Density 0.005%