INDEX
    Explanations

    days of the week

    New Auto-Interp
    Negative Logits
    .Head
    -0.08
    -alt
    -0.07
    -0.07
    her
    -0.07
    相结合
    -0.07
    Discover
    -0.07
    拥挤
    -0.07
    -threatening
    -0.06
    nod
    -0.06
    沉重
    -0.06
    POSITIVE LOGITS
    zac
    0.07
     praw
    0.07
    NetMessage
    0.07
    0.06
    .Bit
    0.06
    صاب
    0.06
    _pref
    0.06
     bigot
    0.06
    あまり
    0.06
    0.06
    Act Density 0.025%

    No Known Activations