INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    links
    -0.69
    gat
    -0.68
     Tasmania
    -0.67
    uten
    -0.66
    posts
    -0.64
     Albania
    -0.64
    rade
    -0.62
     Luffy
    -0.62
     Judaism
    -0.62
    aji
    -0.61
    POSITIVE LOGITS
     same
    1.42
     latter
    1.39
     aforementioned
    1.27
     entire
    1.22
     entirety
    1.14
    oret
    1.13
     slightest
    1.13
    ses
    1.13
     latest
    1.08
     whole
    1.06
    Act Density 0.081%

    No Known Activations