INDEX
Explanations
references to "proof" and related concepts
New Auto-Interp
Negative Logits
fax
-0.16
odge
-0.15
/Area
-0.15
inux
-0.14
unch
-0.14
夫
-0.14
ENCED
-0.14
æĬŀ
-0.14
ê»ĺ
-0.14
uman
-0.14
POSITIVE LOGITS
reading
0.33
reader
0.25
read
0.20
-positive
0.19
READING
0.18
ed
0.18
iness
0.18
positive
0.18
Positive
0.18
enstein
0.18
Activations Density 0.017%