INDEX
Explanations
pronouns followed by quote or verb
New Auto-Interp
Negative Logits
fleste
1.71
俩
1.69
enan
1.58
Portanto
1.54
ỏe
1.48
łego
1.48
garten
1.48
ットン
1.45
fuera
1.42
ร
1.42
POSITIVE LOGITS
ди
2.02
ну
1.88
elucid
1.84
ли
1.70
ни
1.69
ا
1.69
ك
1.67
으로써
1.65
也
1.62
skyrock
1.61
Activations Density 0.319%