INDEX
    Explanations

    references to elements or components in a structured format

    New Auto-Interp
    Negative Logits
     للاسماء
    -0.97
    <unused68>
    -0.93
    <unused28>
    -0.93
    [@BOS@]
    -0.93
    <unused74>
    -0.93
    <unused79>
    -0.93
    <unused14>
    -0.92
    <unused16>
    -0.92
    <unused8>
    -0.92
    <unused3>
    -0.92
    POSITIVE LOGITS
    ]
    0.40
    %
    0.36
    and
    0.36
    ...
    0.35
    0.35
    0.34
    .
    0.34
    1
    0.33
    0.32
             
    0.32
    Act Density 0.356%

    No Known Activations