INDEX
Explanations
personal statements or observations
New Auto-Interp
Negative Logits
+
0.58
olacaktır
0.46
ंसाठी
0.45
kullanılır
0.43
अनुप्रयोग
0.43
eşit
0.42
çarp
0.42
komplex
0.42
приложений
0.42
hidrat
0.42
POSITIVE LOGITS
ਾਨੂੰ
0.54
समाजसेवी
0.52
ಅಧಿಕಾರಿ
0.51
resentment
0.51
unwillingness
0.50
لوگوں
0.50
deplorable
0.50
policemen
0.49
हालात
0.49
ಜನರು
0.48
Activations Density 0.006%