INDEX
    Explanations

    phrases related to strong emotions or reactions

    emotional reactions and intense responses related to experiences

    New Auto-Interp
    Negative Logits
    mut
    -0.66
    repl
    -0.65
     merged
    -0.64
    zero
    -0.63
    married
    -0.63
    ipped
    -0.61
     substituted
    -0.58
    fried
    -0.57
     background
    -0.57
    ouched
    -0.56
    POSITIVE LOGITS
    bies
    0.75
     unnecessarily
    0.72
     territ
    0.70
    =-=-=-=-
    0.68
    GGGGGGGG
    0.68
    aughs
    0.67
    iday
    0.66
    Strange
    0.64
    ãĢĤ
    0.63
    ikes
    0.63
    Act Density 0.229%

    No Known Activations