INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Out
    -0.07
    dated
    -0.07
    大型
    -0.07
     Encode
    -0.07
    -0.07
     Choir
    -0.06
     bureauc
    -0.06
    Pairs
    -0.06
     tourists
    -0.06
     safari
    -0.06
    POSITIVE LOGITS
    0.07
    Cele
    0.07
     able
    0.07
     projectName
    0.07
    見る
    0.06
     anybody
    0.06
    LING
    0.06
    毫不犹豫
    0.06
    (Activity
    0.06
    0.06
    Act Density 0.005%

    No Known Activations