INDEX
Explanations
references to organizations and their activities
New Auto-Interp
Negative Logits
ие
-0.18
kie
-0.16
uracy
-0.16
ccoli
-0.16
gal
-0.16
orie
-0.15
shire
-0.15
ary
-0.15
Ãłng
-0.15
ereotype
-0.15
POSITIVE LOGITS
izational
0.36
isations
0.28
izations
0.25
igram
0.24
izers
0.23
Grinder
0.23
iser
0.23
omet
0.23
ized
0.23
za
0.22
Activations Density 0.009%