INDEX
    Explanations

    words related to proper nouns or names

    New Auto-Interp
    Negative Logits
     NATO
    -0.70
     vert
    -0.66
     eurozone
    -0.64
    UE
    -0.62
     sign
    -0.62
     helium
    -0.62
     makeup
    -0.61
     Nato
    -0.60
     reserve
    -0.59
     Wings
    -0.59
    POSITIVE LOGITS
    har
    4.47
    Har
    1.77
    hari
    1.61
    hur
    1.56
    han
    1.52
    hat
    1.44
    hun
    1.35
    kar
    1.32
    haw
    1.31
     Har
    1.27
    Act Density 0.005%

    No Known Activations