INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     lifetime
    -2.00
    lifetime
    -1.72
     lifespan
    -1.66
     life
    -1.64
    Lifetime
    -1.64
     Lifetime
    -1.63
    life
    -1.52
     lifetimes
    -1.51
    LIFE
    -1.41
     lifestyle
    -1.41
    POSITIVE LOGITS
     of
    0.64
    of
    0.44
       
    0.43
    ,
    0.41
    FT
    0.41
      
    0.40
    Modification
    0.39
    .
    0.38
    мак
    0.37
     decided
    0.37
    Act Density 0.037%

    No Known Activations