INDEX
    Explanations

    citation references and numerical data typically found in academic articles

    New Auto-Interp
    Negative Logits
    idge
    -0.17
    ech
    -0.17
    hood
    -0.14
    eree
    -0.14
    lung
    -0.14
    iew
    -0.14
    innie
    -0.14
     Bar
    -0.14
     Fo
    -0.14
    ousse
    -0.14
    POSITIVE LOGITS
     sup
    0.18
     Sup
    0.16
    -sup
    0.16
    upo
    0.16
    ì¶ķ
    0.15
    ftime
    0.15
    ogne
    0.15
     subtotal
    0.15
    imoto
    0.14
    ogg
    0.14
    Act Density 0.057%

    No Known Activations