INDEX
    Explanations

    phrases indicating knowledge or awareness

    New Auto-Interp
    Negative Logits
    alance
    -0.07
    ephy
    -0.06
    urch
    -0.06
    icher
    -0.06
     Localization
    -0.06
    ismus
    -0.05
     Everyday
    -0.05
    iceps
    -0.05
    urai
    -0.05
    zell
    -0.05
    POSITIVE LOGITS
     Masc
    0.07
    ardo
    0.07
    lys
    0.07
    iture
    0.07
    .scalablytyped
    0.07
     اÙĦسÙħ
    0.07
    CADE
    0.07
    ligt
    0.07
    _java
    0.07
    FRING
    0.06
    Act Density 0.017%

    No Known Activations