INDEX
    Explanations

    linguistics

    New Auto-Interp
    Negative Logits
     severely
    -0.08
    .nt
    -0.08
    Export
    -0.08
     negatively
    -0.08
    -0.08
     vasta
    -0.08
     largely
    -0.08
     drastically
    -0.08
     er
    -0.08
    .Export
    -0.08
    POSITIVE LOGITS
    表示
    0.10
     verbs
    0.09
    0.09
     표시
    0.09
    0.09
    -cap
    0.09
     nouns
    0.09
     Wörter
    0.08
     pouch
    0.08
     lautet
    0.08
    Act Density 0.008%

    No Known Activations