INDEX
Explanations
phrases related to perceptions of intelligence or competence in relation to social and political issues
New Auto-Interp
Negative Logits
SharedCtor
-0.64
Numerade
-0.56
AndEndTag
-0.55
ثيق
-0.54
TestBed
-0.52
DUT
-0.49
bors
-0.49
SHOWING
-0.48
recognising
-0.48
<<<<<<<<<<<<<<
-0.47
POSITIVE LOGITS
utafitiHapana
0.66
somehow
0.61
صوتيه
0.59
magically
0.56
stateProvider
0.55
ویکیپدیا
0.49
Universitaria
0.49
tropicales
0.49
raszamy
0.49
economica
0.48
Activations Density 0.180%