INDEX
    Explanations

    phrases expressing opinions or viewpoints

    New Auto-Interp
    Negative Logits
     answ
    -0.71
    mentioned
    -0.68
    ramids
    -0.67
    then
    -0.64
    aunted
    -0.63
    cies
    -0.63
    ersen
    -0.62
    rote
    -0.62
    ITH
    -0.62
    leaf
    -0.61
    POSITIVE LOGITS
     synonymous
    0.95
     belonging
    0.92
     unbeat
    0.85
     credible
    0.83
     unfit
    0.83
     indispensable
    0.82
     embod
    0.81
     illegitimate
    0.80
     unethical
    0.79
     unworthy
    0.78
    Act Density 1.001%

    No Known Activations