INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Sever
    -0.59
    Sever
    -0.57
    aze
    -0.56
    moreland
    -0.54
     severance
    -0.53
     stanga
    -0.52
    anzo
    -0.52
    angler
    -0.52
     épar
    -0.52
     Crunch
    -0.52
    POSITIVE LOGITS
     film
    1.83
     Film
    1.68
    Film
    1.66
    film
    1.64
     FILM
    1.52
    FILM
    1.45
     films
    1.41
     Films
    1.22
    films
    1.18
    Films
    1.14
    Act Density 0.015%

    No Known Activations