INDEX
Explanations
phrases indicating relational or possessive references
New Auto-Interp
Negative Logits
ãĤĪãģĨãģª
-0.19
ãģĬ
-0.19
ä¸ĢåĪĩ
-0.17
大
-0.16
åŃIJä¾Ľ
-0.16
ä¸Ģ
-0.16
Äijây
-0.15
人æ°Ĺ
-0.15
orem
-0.15
fan
-0.15
POSITIVE LOGITS
sorts
0.38
course
0.32
course
0.27
0.26
vido
0.25
-course
0.25
ftime
0.22
/from
0.21
/by
0.21
lox
0.20
Activations Density 1.823%