INDEX
    Explanations

    bullet points with descriptions

    New Auto-Interp
    Negative Logits
     primero
    1.05
     primeiro
    1.04
     mistake
    1.03
     phase
    1.01
     dilemma
    1.00
     below
    0.98
     choice
    0.98
     first
    0.97
     crux
    0.96
     প্রথমেই
    0.94
    POSITIVE LOGITS
    </i>
    1.66
    <eos>
    1.52
    "].
    1.39
    1.38
    .</
    1.35
    ].
    1.25
    </
    1.25
     ].
    1.22
    </li>
    1.20
    </em>
    1.19
    Act Density 0.389%

    No Known Activations