INDEX
    Explanations

    words related to support or endorsement

    New Auto-Interp
    Negative Logits
     unbeliev
    -0.67
    olars
    -0.65
    itals
    -0.65
    ancies
    -0.62
    ouls
    -0.60
     budgets
    -0.59
     careers
    -0.58
    eni
    -0.57
    anders
    -0.57
    uku
    -0.57
    POSITIVE LOGITS
     of
    1.02
     thereof
    0.97
    lier
    0.89
     OF
    0.78
    hesis
    0.78
    hetical
    0.73
     inhibitor
    0.72
     Of
    0.71
    Reviewer
    0.69
    OF
    0.69
    Act Density 0.255%

    No Known Activations