INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    themselves
    -0.52
     Chord
    -0.51
     purpoſe
    -0.50
     ſtate
    -0.49
     ſta
    -0.48
     pouvoit
    -0.47
    stick
    -0.46
     perſon
    -0.46
    Preference
    -0.46
    bowl
    -0.46
    POSITIVE LOGITS
    цездатний
    0.56
    Its
    0.56
     egne
    0.55
     nahilalakip
    0.54
     Its
    0.54
    oa̍t
    0.50
    它的
    0.48
    esas
    0.48
    inerja
    0.47
    NUMX
    0.47
    Act Density 0.117%

    No Known Activations