INDEX
    Explanations

    occurrences of the word "and"

    New Auto-Interp
    Negative Logits
    ounty
    -0.17
    eczy
    -0.17
    riad
    -0.16
    ricks
    -0.15
    orsk
    -0.15
    lew
    -0.15
    oun
    -0.15
    rick
    -0.14
    ereum
    -0.14
    _PK
    -0.14
    POSITIVE LOGITS
    ιλο
    0.16
    ITT
    0.15
     ALS
    0.15
    im
    0.14
    incl
    0.14
    erk
    0.14
    ife
    0.14
    ipar
    0.14
     Achilles
    0.14
    vip
    0.13
    Act Density 0.321%

    No Known Activations