INDEX
    Explanations

    names of specific individuals

    New Auto-Interp
    Negative Logits
     for
    -0.90
     and
    -0.89
     at
    -0.89
     in
    -0.87
     on
    -0.86
     or
    -0.84
    ↵↵
    -0.84
     to
    -0.82
     but
    -0.82
     as
    -0.82
    POSITIVE LOGITS
     alkoh
    2.15
     Traité
    2.06
     Sén
    2.04
     Strukt
    2.02
     embra
    2.01
     simplif
    1.98
     mef
    1.93
     Lég
    1.93
     dises
    1.92
     Cfr
    1.92
    Act Density 0.517%

    No Known Activations