INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    modern
    -0.09
     Grim
    -0.08
    λογ
    -0.08
    ALE
    -0.08
    Modern
    -0.08
    Gall
    -0.08
    -0.08
    -dark
    -0.08
    Grass
    -0.07
    Cant
    -0.07
    POSITIVE LOGITS
     uti
    0.08
     críticos
    0.08
     прих
    0.08
     on
    0.08
     Lobby
    0.08
     rhetoric
    0.08
     cru
    0.07
    emony
    0.07
     engineers
    0.07
     contest
    0.07
    Act Density 0.000%

    No Known Activations