INDEX
    Explanations

    numerical data and references in academic or technical contexts

    New Auto-Interp
    Negative Logits
    227
    -0.17
    233
    -0.16
    235
    -0.15
     Bias
    -0.15
    175
    -0.15
    274
    -0.15
    243
    -0.15
     bias
    -0.15
    154
    -0.15
    ike
    -0.15
    POSITIVE LOGITS
    950
    0.36
    956
    0.35
    954
    0.35
    966
    0.34
    900
    0.34
    958
    0.34
    996
    0.33
    920
    0.33
    953
    0.33
    960
    0.33
    Act Density 0.153%

    No Known Activations