INDEX
    Explanations

    names or proper nouns ending in 'ly'

    New Auto-Interp
    Negative Logits
    ilater
    -0.91
    aciously
    -0.84
    ifully
    -0.82
    ilogy
    -0.75
    artifacts
    -0.72
    itive
    -0.70
    irlf
    -0.70
    ilaterally
    -0.69
     indo
    -0.69
    arsity
    -0.69
    POSITIVE LOGITS
    tics
    1.13
    rics
    1.05
    phant
    0.94
    mph
    0.92
    rical
    0.87
    sis
    0.87
    ndra
    0.84
    ffe
    0.82
    lene
    0.82
    nda
    0.81
    Act Density 0.038%

    No Known Activations