INDEX
Explanations
statements reflecting social insights or critiques on personal responsibility and societal issues
New Auto-Interp
Negative Logits
transQ
-0.79
verifyException
-0.76
ьаж
-0.68
المناصب
-0.62
İstinadlar
-0.60
'][$
-0.57
RTDA
-0.55
Хьажоргаш
-0.54
للتو
-0.54
FailureListener
-0.53
POSITIVE LOGITS
myſelf
0.67
purpoſe
0.65
ſeveral
0.65
themſelves
0.64
fubject
0.60
juſ
0.59
pleaſure
0.58
raiſ
0.57
perfons
0.57
occaf
0.56
Activations Density 0.026%