INDEX
Explanations
punctuation and mixed languages
New Auto-Interp
Negative Logits
ς
1.09
s
1.08
sı
1.05
lara
0.99
lardan
0.93
sail
0.89
ים
0.86
sene
0.85
CH
0.84
soldiers
0.82
POSITIVE LOGITS
whereabouts
0.80
on
0.72
].”
0.71
礪
0.70
kudos
0.69
]<
0.68
indicato
0.68
”
0.67
২৮
0.65
つけて
0.65
Activations Density 1.621%