INDEX
Explanations
statements and assertions involving "the" and other definite references
New Auto-Interp
Negative Logits
ãĥ³ãĤ¯
-0.14
olt
-0.14
burg
-0.14
rada
-0.14
.eng
-0.13
бина
-0.13
possibility
-0.13
ļĮ
-0.12
ãĤĩ
-0.12
deriv
-0.12
POSITIVE LOGITS
reason
0.32
problem
0.26
key
0.22
Problem
0.21
issue
0.21
problem
0.20
main
0.20
thing
0.20
answer
0.19
trouble
0.19
Activations Density 0.277%