INDEX
    Explanations

    names of political or public figures

    isolated single letters or characters, particularly at the beginning of words

    New Auto-Interp
    Negative Logits
    artifacts
    -0.68
    hett
    -0.63
     CoC
    -0.63
    ashtra
    -0.62
    ELF
    -0.61
    channelAvailability
    -0.61
    TextColor
    -0.59
    lasses
    -0.58
    Measure
    -0.58
    AUD
    -0.58
    POSITIVE LOGITS
    uala
    0.88
     Alvarez
    0.85
     Camer
    0.80
    icz
    0.75
    wu
    0.74
    uma
    0.72
    oglu
    0.70
    ulum
    0.70
     Philippe
    0.68
     Tec
    0.65
    Act Density 0.220%

    No Known Activations