INDEX
    Explanations

    words and phrases indicating personal reflections or evaluations

    New Auto-Interp
    Negative Logits
     Leer
    -0.14
    UC
    -0.14
    æīĺ
    -0.14
     Corner
    -0.14
    ross
    -0.14
    κά
    -0.14
     chain
    -0.14
    ĥģ
    -0.13
     convention
    -0.13
    heim
    -0.13
    POSITIVE LOGITS
    fone
    0.17
    uso
    0.16
    'gc
    0.15
    /styles
    0.15
     edin
    0.14
    kins
    0.14
    ohl
    0.14
    ovÃŃ
    0.14
    _invoke
    0.14
    olver
    0.14
    Act Density 0.027%

    No Known Activations