INDEX
    Explanations

    names and references to historical figures in academia

    New Auto-Interp
    Negative Logits
    igon
    -0.17
     Dixon
    -0.15
     collateral
    -0.14
    (AT
    -0.14
    UIL
    -0.14
    WW
    -0.14
     sitting
    -0.14
     uv
    -0.14
    xon
    -0.14
    ite
    -0.13
    POSITIVE LOGITS
    LU
    0.19
    BU
    0.17
    kü
    0.17
    doch
    0.16
    kur
    0.16
    PU
    0.16
    CUS
    0.16
    BUM
    0.15
    ÅĤu
    0.15
    λμ
    0.15
    Act Density 0.102%

    No Known Activations