INDEX
    Explanations

    phrases related to giving advice or instructions

    New Auto-Interp
    Negative Logits
    ufficient
    -0.66
    cup
    -0.60
    ashington
    -0.58
    ighed
    -0.56
    luent
    -0.56
    ounge
    -0.54
    cious
    -0.54
    worthiness
    -0.53
    aneous
    -0.52
     Core
    -0.51
    POSITIVE LOGITS
     rant
    0.64
     denial
    0.64
     reorgan
    0.64
     tir
    0.62
     looting
    0.60
    å§«
    0.59
     adventures
    0.59
     reckless
    0.58
    Ô
    0.58
     warr
    0.56
    Act Density 21.874%

    No Known Activations