INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     kontakte
    -0.07
    &id
    -0.07
    _contrib
    -0.07
    _Player
    -0.06
    /md
    -0.06
    .FILL
    -0.06
    _DA
    -0.06
    /videos
    -0.06
    -0.06
    /bower
    -0.06
    POSITIVE LOGITS
     but
    0.10
    but
    0.08
     αλλά
    0.07
    truth
    0.07
     BUT
    0.07
     just
    0.07
     Hồ
    0.06
    Model
    0.06
    PU
    0.06
     fought
    0.06
    Act Density 0.066%

    No Known Activations