INDEX
    Explanations

    semicolons and punctuation marks in the text

    New Auto-Interp
    Negative Logits
     fran
    -0.76
     of
    -0.68
    alan
    -0.63
     vol
    -0.60
     up
    -0.59
     dead
    -0.59
    widetilde
    -0.59
     de
    -0.57
     glo
    -0.57
    tps
    -0.57
    POSITIVE LOGITS
    $;
    1.79
    ;
    1.66
    }$;
    1.64
    +;
    1.57
    .;
    1.56
    ;;;
    1.53
    %;
    1.52
    ;;
    1.50
    _;
    1.50
    ;;;;
    1.49
    Act Density 0.221%

    No Known Activations