INDEX
    Explanations

    references to specific points in time or indications of temporal context

    New Auto-Interp
    Negative Logits
    yers
    -0.16
     Sector
    -0.15
    wor
    -0.15
    ollo
    -0.15
    urb
    -0.14
    than
    -0.14
    au
    -0.14
    cky
    -0.14
    jon
    -0.14
    ög
    -0.14
    POSITIVE LOGITS
    uation
    0.16
    λεκ
    0.15
    osyal
    0.15
    uate
    0.14
    ulary
    0.14
    uilder
    0.14
     dét
    0.14
    ãĥ¼ãĥį
    0.14
    .deg
    0.14
    Unnamed
    0.13
    Act Density 0.010%

    No Known Activations