INDEX
    Explanations

    occurrences of specific formatting markers or placeholders in text

    New Auto-Interp
    Negative Logits
    +#+#
    -1.13
     autorytatywna
    -0.96
     CanadaChoose
    -0.94
    Hentet
    -0.94
     nahilalakip
    -0.93
     cherchés
    -0.92
    Rüyada
    -0.92
     EconPapers
    -0.92
     ffilmiau
    -0.90
     Signalez
    -0.90
    POSITIVE LOGITS
    ↵↵
    0.96
      
    0.68
    0.68
    ↵↵↵
    0.67
    <eos>
    0.60
    ↵↵↵↵↵
    0.52
     However
    0.50
     .
    0.49
    ↵↵↵↵
    0.48
       
    0.47
    Act Density 0.021%

    No Known Activations