INDEX
    Explanations

    mentions of the color white and related concepts

    New Auto-Interp
    Negative Logits
     queſta
    -2.50
    <unused43>
    -2.45
    <unused41>
    -2.44
    <unused23>
    -2.44
    <unused74>
    -2.44
    <unused42>
    -2.44
    [@BOS@]
    -2.44
    <unused14>
    -2.42
    <unused8>
    -2.42
    <unused3>
    -2.42
    POSITIVE LOGITS
    1.68
    1.52
    ,
    1.48
    ↵↵
    1.43
    -
    1.42
    .
    1.41
     (
    1.37
    y
    1.34
      
    1.30
     I
    1.30
    Act Density 2.546%

    No Known Activations