INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ActionTypes
    -0.10
     wreck
    -0.09
    ãĢĥ
    -0.09
     wre
    -0.09
     Sap
    -0.09
     Brut
    -0.09
    ;/*
    -0.08
     serif
    -0.08
    zano
    -0.08
    );$
    -0.08
    POSITIVE LOGITS
    forall
    0.14
     everyone
    0.13
     forall
    0.12
     others
    0.12
    everyone
    0.12
     towards
    0.11
     vůÄįi
    0.11
     toward
    0.11
     bagi
    0.10
     вÑģеÑħ
    0.10
    Act Density 0.053%

    No Known Activations