INDEX
    Explanations

    sentences indicating possession or ownership

    expressions of hope and positivity

    New Auto-Interp
    Negative Logits
    hops
    -0.57
    lvl
    -0.53
    ãĥ«
    -0.51
    KING
    -0.50
    ean
    -0.50
    hare
    -0.50
    obal
    -0.49
    urs
    -0.49
    hig
    -0.49
    aux
    -0.49
    POSITIVE LOGITS
    .—
    0.92
    !,
    0.89
    .[
    0.81
    ;
    0.81
    !
    0.80
    .ãĢį
    0.79
    ,—
    0.79
    .
    0.79
    !.
    0.78
    .(
    0.77
    Act Density 0.969%

    No Known Activations