INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    rap
    -0.07
     revolutionary
    -0.07
     enum
    -0.06
     design
    -0.06
    -0.06
    ibr
    -0.06
    _power
    -0.06
     Rebels
    -0.06
    _using
    -0.06
     men
    -0.06
    POSITIVE LOGITS
    ΑΚ
    0.07
     Emma
    0.07
    ницип
    0.06
     miglior
    0.06
    enerima
    0.06
    Oak
    0.06
     тверд
    0.06
    Ba
    0.06
     пог
    0.06
    marginLeft
    0.06
    Act Density 1.003%

    No Known Activations