INDEX
    Explanations

    sections of text related to scientific evaluation and methodology

    New Auto-Interp
    Negative Logits
     something
    -0.74
    something
    -0.69
     lainnya
    -0.65
     anything
    -0.65
    anything
    -0.64
     autre
    -0.64
     addirittura
    -0.63
     tudo
    -0.63
     semuanya
    -0.61
     muchísimo
    -0.61
    POSITIVE LOGITS
     selected
    1.20
     ausgewä
    0.96
     various
    0.96
     Selected
    0.96
    selected
    0.93
     SELECTED
    0.92
    Selected
    0.85
    various
    0.83
     select
    0.82
    SELECTED
    0.81
    Act Density 2.278%

    No Known Activations