INDEX
    Explanations

    quotes and dialogue markers within the text

    New Auto-Interp
    Negative Logits
    \""
    -0.97
    }}"
    -0.93
    osoba
    -0.89
    {}".
    -0.87
      “
    -0.86
    (",")
    -0.86
     "}
    -0.85
     Menge
    -0.84
     $("<
    -0.83
    ]["
    -0.81
    POSITIVE LOGITS
     '
    1.09
    !='
    1.08
    Ndr
    0.98
     ='
    0.94
     ('
    0.94
    +'
    0.94
    ==='
    0.93
    >';
    
    0.93
    =’
    0.91
     '.
    0.91
    Act Density 0.247%

    No Known Activations