INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    éĹĺ
    -0.83
    DERR
    -0.79
    esson
    -0.75
    oulos
    -0.74
     Edison
    -0.73
    artz
    -0.73
    erences
    -0.72
    farious
    -0.70
    ISTER
    -0.67
    ORN
    -0.66
    POSITIVE LOGITS
    patch
    1.08
     barking
    1.03
    fighting
    1.02
    meat
    1.00
    fight
    0.99
    gie
    0.96
    fights
    0.94
    fighter
    0.93
    matic
    0.93
    matically
    0.92
    Act Density 0.030%

    No Known Activations