INDEX
Explanations
affirmative answers to questions
New Auto-Interp
Negative Logits
دانشنامهٔ
-0.54
للمعارف
-0.47
Hentet
-0.43
dė
-0.40
respe
-0.39
DockStyle
-0.38
AssemblyCulture
-0.37
TestingModule
-0.37
jenost
-0.37
PeEnEo
-0.36
POSITIVE LOGITS
answer
1.14
answers
0.96
answer
0.94
Answer
0.85
answered
0.84
antwoord
0.82
回答
0.79
Answer
0.78
answers
0.77
答案
0.77
Activations Density 0.559%