INDEX
Explanations
possessive pronoun and relative pronoun
New Auto-Interp
Negative Logits
nya
1.26
ni
1.18
つまり
1.16
tained
1.12
ms
1.07
la
1.07
dotycz
1.07
spring
1.06
lerinden
1.04
ts
1.03
POSITIVE LOGITS
ل
1.16
ят
1.11
ن
1.05
ير
1.03
知道
1.02
л
1.02
ت
0.98
уравнения
0.91
多次
0.86
行う
0.86
Activations Density 0.073%