INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     so
    -0.68
     in
    -0.66
     as
    -0.65
     tupperware
    -0.64
     im
    -0.62
    占用
    -0.62
     he
    -0.61
     sign
    -0.61
     let
    -0.60
     her
    -0.60
    POSITIVE LOGITS
     alkoh
    1.24
     silikon
    1.23
     antik
    1.23
     kafe
    1.21
     kosme
    1.19
     praktik
    1.18
     optik
    1.15
     mikrofon
    1.14
     keramik
    1.14
     panik
    1.14
    Act Density 0.247%

    No Known Activations