INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Shakespeare
    -0.07
    ória
    -0.07
     Gesch
    -0.07
    ğan
    -0.07
     Sır
    -0.06
     mistr
    -0.06
     Heller
    -0.06
     гла
    -0.06
     Garr
    -0.06
    bere
    -0.06
    POSITIVE LOGITS
     unit
    0.20
     Unit
    0.16
    Unit
    0.16
     units
    0.16
    unit
    0.14
     Units
    0.12
    _unit
    0.12
    -unit
    0.12
    units
    0.12
    Units
    0.11
    Act Density 0.024%

    No Known Activations