INDEX
    Explanations

    quotations from individuals

    utilization of quotations in the text

    New Auto-Interp
    Negative Logits
    ãĥ¼ãĥĨ
    -0.71
    venth
    -0.69
    ãĥ¼ãĥĨãĤ£
    -0.64
    ems
    -0.59
    ensible
    -0.58
    acly
    -0.57
    filled
    -0.57
    irrel
    -0.56
     conflic
    -0.55
    Honest
    -0.55
    POSITIVE LOGITS
    :
    0.75
    :"
    0.74
     goodbye
    0.73
    :]
    0.71
    :'
    0.71
     "...
    0.68
    jriwal
    0.67
     Rohing
    0.66
     sarcast
    0.65
    ]:
    0.65
    Act Density 0.189%

    No Known Activations