INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     heroic
    -0.07
    Cargo
    -0.07
    "↵↵↵
    -0.07
     yolc
    -0.07
    UTES
    -0.07
    )',
    -0.06
     calibrated
    -0.06
    Contrib
    -0.06
     soo
    -0.06
    __()
    -0.06
    POSITIVE LOGITS
     nasty
    0.33
     nast
    0.13
    asty
    0.12
     нак
    0.08
     pleasant
    0.08
    0.08
    0.08
     wicked
    0.08
     unpleasant
    0.08
    innitus
    0.07
    Act Density 0.003%

    No Known Activations