INDEX
    Explanations

    proper names, particularly those related to individuals and notable figures

    New Auto-Interp
    Negative Logits
    ilater
    -0.90
    ulators
    -0.81
    urities
    -0.78
     indo
    -0.76
    ulates
    -0.73
    ular
    -0.67
    imony
    -0.67
    ulatory
    -0.66
    ifier
    -0.65
    ebus
    -0.65
    POSITIVE LOGITS
     Doyle
    1.32
    oyle
    1.19
    idge
    0.87
    hyde
    0.85
    hiba
    0.85
    ragon
    0.78
    weed
    0.77
    mount
    0.74
    brush
    0.74
    gaard
    0.73
    Act Density 0.008%

    No Known Activations