INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    istoitu
    -0.69
    <unused43>
    -0.66
    <unused14>
    -0.66
    <unused8>
    -0.66
    <unused28>
    -0.66
    <unused23>
    -0.66
    [@BOS@]
    -0.66
    <unused68>
    -0.66
    <unused51>
    -0.66
    <pad>
    -0.66
    POSITIVE LOGITS
      
    0.42
     bright
    0.40
     său
    0.36
    0.34
    0.33
    <strong>
    0.33
    bright
    0.33
     cool
    0.32
     &
    0.32
    ...
    0.31
    Act Density 0.048%

    No Known Activations