INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    âĹ¼
    -0.67
    EStreamFrame
    -0.67
     afore
    -0.63
     Mayhem
    -0.62
    tnc
    -0.59
    Reviewer
    -0.58
     Volume
    -0.58
     Phase
    -0.58
     Anarchy
    -0.57
     Goat
    -0.56
    POSITIVE LOGITS
    't
    1.47
    ned
    1.18
    ates
    1.02
    ning
    0.98
    ate
    0.96
    atives
    0.95
    atell
    0.94
    uts
    0.94
    nell
    0.92
    etsk
    0.91
    Act Density 0.647%

    No Known Activations