INDEX
    Explanations

    words related to opinions or beliefs

    New Auto-Interp
    Negative Logits
    "},"
    -0.75
    etter
    -0.63
    artney
    -0.62
    natureconservancy
    -0.62
    ropolis
    -0.62
    rosis
    -0.61
    DN
    -0.61
    debian
    -0.60
    dL
    -0.60
    uits
    -0.59
    POSITIVE LOGITS
     yourselves
    0.72
     yours
    0.71
    beit
    0.69
    ably
    0.69
     kidding
    0.66
     me
    0.66
     guessing
    0.63
    ingly
    0.62
    fax
    0.62
     thy
    0.62
    Act Density 0.114%

    No Known Activations