INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Devil
    -0.07
    ija
    -0.06
    /ui
    -0.06
    _attrib
    -0.06
     Whilst
    -0.06
    ملكة
    -0.06
     závis
    -0.06
     vul
    -0.06
     свят
    -0.06
    ètre
    -0.06
    POSITIVE LOGITS
     more
    0.28
     More
    0.22
     MORE
    0.19
    More
    0.19
    more
    0.18
    MORE
    0.15
     most
    0.14
    -more
    0.13
    .More
    0.13
    .more
    0.12
    Act Density 0.193%

    No Known Activations