INDEX
Explanations
phrases expressing possession or association
New Auto-Interp
Negative Logits
urt
-0.16
era
-0.15
Všech
-0.14
etc
-0.14
oho
-0.14
Corm
-0.14
855
-0.14
ple
-0.14
γκ
-0.14
sắc
-0.14
POSITIVE LOGITS
licken
0.15
inen
0.15
ecta
0.15
ror
0.14
ifr
0.14
uria
0.14
/th
0.13
minus
0.13
rait
0.13
ames
0.13
Activations Density 0.022%