INDEX
    Explanations

    references to different viewpoints or ways of seeing a situation

    New Auto-Interp
    Negative Logits
    ม
    -0.17
    øj
    -0.17
    linger
    -0.16
    øy
    -0.16
    richt
    -0.16
    owi
    -0.15
    ierz
    -0.15
    ë²
    -0.15
    erman
    -0.15
    ling
    -0.15
    POSITIVE LOGITS
    ual
    0.23
    ively
    0.21
    ors
    0.19
    us
    0.19
     view
    0.19
    -taking
    0.19
    .ly
    0.18
    ately
    0.18
    ually
    0.18
    pectives
    0.17
    Act Density 0.029%

    No Known Activations