INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     we
    -1.70
    We
    -1.03
     We
    -1.02
    我們
    -1.00
    我们
    -0.92
     мы
    -0.92
    we
    -0.89
    我们就
    -0.76
     нами
    -0.75
     kita
    -0.75
    POSITIVE LOGITS
     are
    0.98
     have
    0.83
     believe
    0.74
     seek
    0.73
     operate
    0.72
     rely
    0.71
     intend
    0.71
     advertise
    0.71
     compensate
    0.70
     strive
    0.69
    Act Density 0.088%

    No Known Activations