INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     <=",
    -0.65
     abbot
    -0.64
     noqa
    -0.62
    rawan
    -0.59
     /\.
    -0.58
     analisi
    -0.57
    arcoma
    -0.57
     parson
    -0.56
     monastery
    -0.56
     monasteries
    -0.56
    POSITIVE LOGITS
     names
    1.14
     name
    0.94
     Names
    0.91
    names
    0.87
     NAMES
    0.87
     NAME
    0.78
    Names
    0.74
    NAMES
    0.70
    apimachinery
    0.70
     brand
    0.69
    Act Density 0.222%

    No Known Activations