INDEX
Explanations
possessive pronouns and expressions of ownership or affiliation
New Auto-Interp
Negative Logits
ubber
-0.15
UCE
-0.14
оÑĤÑĢеб
-0.14
Ñģом
-0.14
aney
-0.14
own
-0.13
iday
-0.13
olet
-0.13
æĭĵ
-0.13
èį
-0.13
POSITIVE LOGITS
Bast
0.16
erva
0.16
orr
0.14
inkel
0.14
Ãł
0.14
astr
0.13
Adler
0.13
Britain
0.13
uml
0.13
jit
0.13
Activations Density 0.163%