INDEX
Negative Logits
Vladimir
-0.08
pension
-0.08
.extend
-0.08
PRI
-0.07
Princess
-0.07
alwa
-0.07
Fuse
-0.07
pyg
-0.07
积
-0.07
czę
-0.07
POSITIVE LOGITS
controlled
0.13
Controlled
0.13
Controlled
0.12
controlled
0.12
gecontrole
0.11
rigor
0.11
-controlled
0.11
Treatments
0.10
controlar
0.10
эксперимент
0.10
Activations Density 0.027%