INDEX
Explanations
updates and improvements related to software and user interactions
New Auto-Interp
Negative Logits
even
-0.61
being
-0.58
especially
-0.58
just
-0.55
actually
-0.54
either
-0.51
such
-0.51
maybe
-0.49
when
-0.48
only
-0.48
POSITIVE LOGITS
了
0.90
了一
0.86
了自己的
0.82
了一个
0.81
起来
0.79
了两
0.77
了一個
0.76
了自己
0.76
着
0.76
了他
0.76
Activations Density 0.023%