INDEX
    Explanations

    references to ideal conditions or scenarios

    New Auto-Interp
    Negative Logits
    [
    -0.73
    -0.69
    -0.67
     [
    -0.64
    (
    -0.63
    ET
    -0.61
    -0.61
    <eos>
    -0.58
    ….
    -0.58
    ...
    -0.57
    POSITIVE LOGITS
     ideal
    2.16
    ideal
    2.14
     Ideal
    2.09
     IDEAL
    2.08
    Ideal
    2.05
     idéal
    1.82
     idéale
    1.80
     ideale
    1.78
     ideales
    1.65
    理想
    1.50
    Act Density 0.064%

    No Known Activations