INDEX
    Explanations

    the word "loyal" with a high activation

    New Auto-Interp
    Negative Logits
    EVA
    -0.74
    OUT
    -0.71
     Virus
    -0.67
     Autism
    -0.66
     Drugs
    -0.65
     Schwarz
    -0.65
     Genetics
    -0.65
    FER
    -0.64
     Clinic
    -0.64
    LOD
    -0.64
    POSITIVE LOGITS
     loyal
    1.06
    itiz
    1.05
     allegiance
    0.95
     loyalty
    0.93
    alty
    0.91
    ties
    0.91
    atile
    0.88
     adherent
    0.88
    iciary
    0.87
     servant
    0.85
    Act Density 0.007%

    No Known Activations