INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     GFP
    -0.08
     mentality
    -0.07
     Å
    -0.07
     Athena
    -0.07
    back
    -0.07
    ':'
    -0.07
    	App
    -0.07
     Hanson
    -0.07
    cida
    -0.07
     sheet
    -0.07
    POSITIVE LOGITS
    ==============================================================================
    0.09
    0.08
    .int
    0.08
     Pret
    0.08
    .INT
    0.08
     pár
    0.08
     explains
    0.08
    Pret
    0.08
    abcdefgh
    0.08
    .easy
    0.08
    Act Density 0.016%

    No Known Activations