INDEX
    Explanations

    phrases related to expressing opinions or beliefs

    concepts related to self-reflection and decision-making

    New Auto-Interp
    Negative Logits
    Adds
    -0.65
     Downing
    -0.64
    Loading
    -0.61
     Lub
    -0.60
     Corpor
    -0.59
     Drunk
    -0.58
     Logan
    -0.57
     Franco
    -0.57
    major
    -0.57
    Berry
    -0.57
    POSITIVE LOGITS
    iety
    0.80
     hereafter
    0.78
     aspire
    0.77
    abouts
    0.74
    yssey
    0.74
    catentry
    0.72
    posium
    0.70
     thereafter
    0.70
     thereof
    0.69
    artney
    0.68
    Act Density 0.339%

    No Known Activations