INDEX
Explanations
possessive pronouns and phrases indicating ownership or relationship
New Auto-Interp
Negative Logits
olin
-0.16
iez
-0.16
akk
-0.15
know
-0.14
ere
-0.14
arih
-0.14
оÑĩ
-0.14
ombat
-0.14
irs
-0.13
ej
-0.13
POSITIVE LOGITS
eding
0.17
ault
0.16
omorphic
0.15
aul
0.15
paged
0.14
άνÏĦα
0.14
-www
0.14
Visitor
0.14
úc
0.14
uel
0.14
Activations Density 0.018%