INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    th
    -0.81
    er
    -0.72
     Eras
    -0.68
    ev
    -0.68
     تضيفلها
    -0.62
    Th
    -0.60
    No
    -0.60
    rd
    -0.60
    pt
    -0.59
    bus
    -0.58
    POSITIVE LOGITS
     prêtres
    0.59
    aarrggbb
    0.57
    SequentialGroup
    0.56
    posób
    0.56
    charpe
    0.54
    nourriture
    0.54
    Erreferentziak
    0.51
    mặt
    0.48
     rapporti
    0.48
    Còn
    0.47
    Act Density 0.106%

    No Known Activations