INDEX
    Explanations

    phrases related to expressing opinions or giving advice

    New Auto-Interp
    Negative Logits
    xxxx
    -0.70
    chnology
    -0.67
    ãģł
    -0.65
    ãĥīãĥ©
    -0.65
    ãĤº
    -0.64
    asaki
    -0.59
    circle
    -0.59
    ļéĨĴ
    -0.59
    iously
    -0.56
    abama
    -0.56
    POSITIVE LOGITS
     he
    1.46
     she
    1.28
    she
    0.99
     said
    0.97
     according
    0.89
    he
    0.82
    said
    0.82
     says
    0.81
     He
    0.80
    He
    0.77
    Act Density 0.280%

    No Known Activations