INDEX
    Explanations

    occurrences of a specific end-of-text token

    New Auto-Interp
    Negative Logits
    grass
    -0.75
     Guinness
    -0.72
     Nieto
    -0.68
     Totem
    -0.67
     Granger
    -0.67
     CPR
    -0.66
     Dolphin
    -0.65
     Sorceress
    -0.65
     Pom
    -0.65
     daylight
    -0.64
    POSITIVE LOGITS
    senal
    1.32
    ansom
    1.19
    ICH
    1.18
    outing
    1.10
    ascal
    1.09
    acing
    1.08
    abbit
    1.07
    agnar
    1.07
    outine
    1.07
    haps
    1.07
    Act Density 0.031%

    No Known Activations