INDEX
    Explanations

    mentions of "kind of" or phrases indicating classification or types

    New Auto-Interp
    Negative Logits
    chn
    -0.16
    alus
    -0.15
    ir
    -0.14
     cooldown
    -0.14
    ary
    -0.13
    esis
    -0.13
     Main
    -0.13
     Fle
    -0.13
     cogn
    -0.13
     As
    -0.13
    POSITIVE LOGITS
    weise
    0.17
    kova
    0.16
    quot
    0.15
    ëģĶ
    0.15
    rome
    0.15
    tras
    0.15
    ofday
    0.14
    olson
    0.14
    zia
    0.14
    leftright
    0.14
    Act Density 0.045%

    No Known Activations