INDEX
Explanations
possessive and individualizing words
New Auto-Interp
Negative Logits
tte
2.11
}
1.88
tion
1.87
le
1.84
ا
1.75
}])
1.66
ра
1.58
al
1.55
에
1.54
tus
1.52
POSITIVE LOGITS
Те
2.14
Д
2.00
बी
1.94
ে
1.86
ले
1.77
تی
1.74
И
1.73
Л
1.73
Га
1.71
То
1.67
Activations Density 0.081%