INDEX
Explanations
variation in the use of personal pronouns and common verbs
New Auto-Interp
Negative Logits
undy
-0.15
raya
-0.14
à¸ŀระราà¸Ĭ
-0.13
Χα
-0.13
imet
-0.13
bane
-0.13
667
-0.13
rome
-0.13
ube
-0.13
-0.13
POSITIVE LOGITS
experience
0.15
kea
0.14
affected
0.14
esel
0.14
801
0.14
Lang
0.14
uppy
0.14
μο
0.14
Lang
0.14
claim
0.14
Activations Density 0.006%