INDEX
Explanations
possessive pronouns indicating ownership or belonging
New Auto-Interp
Negative Logits
ctr
-0.15
kiem
-0.15
izon
-0.15
heimer
-0.15
stra
-0.15
YRO
-0.14
_
-0.14
.Retrofit
-0.14
imoto
-0.14
ÑģÑĤиÑĩ
-0.14
POSITIVE LOGITS
extent
0.26
extent
0.19
linky
0.18
extents
0.18
tune
0.17
Extent
0.16
detriment
0.15
*)((
0.15
_extent
0.15
æ¼Ķ
0.14
Activations Density 0.042%