INDEX
Explanations
lists separated by commas and 'and'
New Auto-Interp
Negative Logits
n
1.61
adsor
1.26
одежды
1.25
ﺝ
1.25
다른
1.21
erdale
1.19
entimes
1.18
शेखर
1.17
торое
1.17
benda
1.16
POSITIVE LOGITS
НА
1.09
Más
1.05
Für
1.04
Así
0.99
Сьогодні
0.99
ATIV
0.95
Η
0.95
Nella
0.93
РА
0.93
ENT
0.91
Activations Density 0.266%