INDEX
Explanations
references to specific individuals, locations, and cultural products
New Auto-Interp
Negative Logits
ÐIJлекÑģандÑĢ
-0.20
ller
-0.19
ináÅĻ
-0.18
пион
-0.17
докÑĤоÑĢ
-0.17
ÏĦήÏĤ
-0.17
ajan
-0.17
полÑĮзоваÑĤ
-0.17
ιÏĥμÏĮÏĤ
-0.17
orer
-0.16
POSITIVE LOGITS
иÑĤелÑı
0.23
вана
0.22
аÑĤелÑı
0.22
iego
0.22
екÑĤоÑĢа
0.21
mana
0.20
onga
0.20
Paula
0.20
ана
0.20
Arth
0.19
Activations Density 0.107%