INDEX
    Explanations

    repeated mentions of flags and their statuses

    New Auto-Interp
    Negative Logits
    "}")
    -0.69
    ."</
    -0.65
    */)
    -0.65
     }}$}
    -0.64
    😚
    -0.63
    ]")
    -0.63
    \"");
    -0.63
    ymce
    -0.63
    INSTANCE
    -0.61
    */}
    -0.61
    POSITIVE LOGITS
     flag
    2.96
     Flag
    2.92
    flag
    2.84
     flags
    2.78
     FLAG
    2.71
    Flag
    2.68
     Flags
    2.50
    FLAG
    2.43
    Flags
    2.28
    flags
    2.25
    Act Density 0.035%

    No Known Activations