INDEX
    Explanations

    phrases indicating a sense of loss or change over time

    New Auto-Interp
    Negative Logits
    -0.66
    -0.65
    ↵↵
    -0.61
     “
    -0.60
     multiple
    -0.58
     "
    -0.57
     alternative
    -0.57
    -0.54
      
    -0.54
     the
    -0.54
    POSITIVE LOGITS
     ujednoznacz
    1.19
    httphttps
    1.16
    1.05
    <unused41>
    1.05
    <unused43>
    1.05
    <unused14>
    1.04
    <unused17>
    1.04
    <unused3>
    1.04
    <pad>
    1.04
    [@BOS@]
    1.04
    Act Density 0.150%

    No Known Activations