INDEX
    Explanations

    instances of a specific formatting pattern (Ċ followed by a number)

    titles or headings related to various topics, particularly in structured formats like lists or instructions

    New Auto-Interp
    Negative Logits
    hement
    -0.79
     resc
    -0.67
     speeding
    -0.66
     citiz
    -0.65
     subpoen
    -0.63
     Sind
    -0.62
     Saras
    -0.62
     neighb
    -0.61
     Singh
    -0.60
    isconsin
    -0.59
    POSITIVE LOGITS
    ³³³³³³³³³³³³³³³³
    0.95
    Spoiler
    0.89
    http
    0.88
    ³³³³
    0.82
    ³³³³³³³³
    0.82
    Unknown
    0.81
    Reward
    0.80
    https
    0.80
    âĹı
    0.80
    Ingredients
    0.77
    Act Density 0.126%

    No Known Activations