INDEX
Explanations
your kind words and affection
New Auto-Interp
Negative Logits
se
0.48
manchas
0.45
um
0.45
eln
0.45
ulfite
0.45
apr
0.43
ll
0.43
ozyg
0.42
ana
0.42
ap
0.42
POSITIVE LOGITS
капитала
0.50
היה
0.48
архі
0.46
постав
0.46
столи
0.46
تث
0.45
ת
0.43
עי
0.43
Пре
0.42
ยาว
0.42
Activations Density 0.013%