INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     entanto
    -0.82
     kerana
    -0.64
     whereas
    -0.64
     Biôgrafia
    -0.61
     Gegenteil
    -0.60
     whilst
    -0.58
    IBarButtonItem
    -0.57
     perchè
    -0.57
     tandis
    -0.57
    kháu
    -0.56
    POSITIVE LOGITS
     it
    1.31
     there
    1.29
     they
    1.24
     we
    1.13
     many
    1.12
     most
    1.06
     the
    1.02
     he
    0.98
     some
    0.92
     no
    0.88
    Act Density 0.173%

    No Known Activations