INDEX
    Explanations

    updates and improvements related to software and user interactions

    New Auto-Interp
    Negative Logits
     even
    -0.61
     being
    -0.58
     especially
    -0.58
     just
    -0.55
     actually
    -0.54
     either
    -0.51
     such
    -0.51
     maybe
    -0.49
     when
    -0.48
     only
    -0.48
    POSITIVE LOGITS
    0.90
    了一
    0.86
    了自己的
    0.82
    了一个
    0.81
    起来
    0.79
    了两
    0.77
    了一個
    0.76
    了自己
    0.76
    0.76
    了他
    0.76
    Act Density 0.023%

    No Known Activations