INDEX
    Explanations

    names or titles within text

    mentions of the word "name."

    New Auto-Interp
    Negative Logits
    yrinth
    -0.87
    romy
    -0.79
    psey
    -0.71
    EMS
    -0.70
     gif
    -0.70
    isexual
    -0.68
    iaries
    -0.68
    elaide
    -0.68
    icult
    -0.66
    Js
    -0.65
    POSITIVE LOGITS
    plates
    1.27
    plate
    1.26
    paces
    1.01
     recognition
    0.86
    ames
    0.85
    akes
    0.84
    brand
    0.83
     tag
    0.81
     aliases
    0.79
    lier
    0.78
    Act Density 0.034%

    No Known Activations