INDEX
    Explanations

    contrasting phrases or opinions

    New Auto-Interp
    Negative Logits
    ocol
    -0.16
    ewis
    -0.15
    kaar
    -0.15
    .inline
    -0.15
    lexport
    -0.14
    μεν
    -0.14
    strup
    -0.14
    rott
    -0.14
    genden
    -0.14
    audi
    -0.13
    POSITIVE LOGITS
    enton
    0.16
    rin
    0.14
    rone
    0.14
    Ŀ
    0.14
    ruh
    0.14
    archy
    0.14
     bab
    0.13
    ler
    0.13
     Coupe
    0.13
     bro
    0.13
    Act Density 0.076%

    No Known Activations