INDEX
    Explanations

    references to cats or feline-themed terms

    New Auto-Interp
    Negative Logits
    idan
    -0.19
    tz
    -0.17
    eds
    -0.16
    orer
    -0.15
    eration
    -0.15
    ureka
    -0.14
    hower
    -0.14
    738
    -0.14
    HL
    -0.14
    eping
    -0.14
    POSITIVE LOGITS
    égorie
    0.26
    apult
    0.25
    nip
    0.19
    amount
    0.19
    fish
    0.18
    ting
    0.17
    calls
    0.17
    ucci
    0.17
    -corner
    0.17
    elog
    0.17
    Act Density 0.044%

    No Known Activations