INDEX
    Explanations

    actions related to attempts or efforts to achieve something

    New Auto-Interp
    Negative Logits
    ãĢģ
    -0.17
     ãĢģ
    -0.16
    angel
    -0.16
    ãĢģä¸Ń
    -0.15
    ninger
    -0.15
    .vendor
    -0.15
     baÅŁta
    -0.14
    Ñģим
    -0.14
    __,__
    -0.14
    ãĢģäºĮ
    -0.14
    POSITIVE LOGITS
     and
    0.42
    -and
    0.37
    _and
    0.33
     And
    0.31
    åĴĮ
    0.29
     vÃł
    0.29
     и
    0.28
    .and
    0.27
    ãģ¨
    0.27
    	and
    0.26
    Act Density 0.047%

    No Known Activations