INDEX
    Explanations

    words associated with quantifiable attributes or measurements

    New Auto-Interp
    Negative Logits
    urious
    -0.15
    uras
    -0.15
     neither
    -0.15
     Bare
    -0.15
    rh
    -0.14
     condition
    -0.14
    ech
    -0.14
     Beer
    -0.14
     Comb
    -0.14
     bare
    -0.14
    POSITIVE LOGITS
    yme
    0.15
    imler
    0.15
    ãĥ«ãĥī
    0.15
    park
    0.15
    illet
    0.15
    ermen
    0.15
    .SizeF
    0.14
    assen
    0.14
    ighb
    0.14
    acons
    0.14
    Act Density 0.035%

    No Known Activations