INDEX
Explanations
references to notable historical figures and their contributions to science
New Auto-Interp
Negative Logits
çļĦä¸Ģ个
-0.19
dieser
-0.16
ï¼Įå®ĥ
-0.14
çļĦä¸Ģ
-0.14
è¿Ļç§į
-0.13
nÃły
-0.13
ìĿ´ëٰ
-0.12
æĺ¯ä¸Ģ个
-0.12
bunu
-0.12
diese
-0.12
POSITIVE LOGITS
the
1.06
the
0.76
the
0.66
_the
0.59
.the
0.52
,the
0.49
-the
0.48
teh
0.44
THE
0.42
ethe
0.37
Activations Density 2.012%