INDEX
    Explanations

    phrases indicating consistency and reliability

    New Auto-Interp
    Negative Logits
    ArrowToggle
    -0.83
    AnchorStyles
    -0.81
    UnusedPrivate
    -0.73
    protoimpl
    -0.72
     allAfrica
    -0.70
     <<<<<<<<<<<<<<
    -0.67
    ########.
    -0.67
     oprot
    -0.67
     transfieras
    -0.65
    WriteTagHelper
    -0.65
    POSITIVE LOGITS
    Consistency
    1.01
     consistency
    0.99
     Consistency
    0.97
    consistency
    0.90
     confusion
    0.88
     inconsistency
    0.84
    confusion
    0.80
     compatibility
    0.78
     momentum
    0.77
    compatibility
    0.76
    Act Density 0.080%

    No Known Activations