INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     cultura
    -0.07
    -city
    -0.07
     plenty
    -0.07
     Bever
    -0.07
    icana
    -0.07
    assemble
    -0.07
    ’y
    -0.07
    leet
    -0.06
    LEY
    -0.06
    oust
    -0.06
    POSITIVE LOGITS
     Ref
    0.16
    Ref
    0.16
     ref
    0.15
    ref
    0.14
    .Ref
    0.13
    (ref
    0.12
    	ref
    0.12
     referees
    0.12
     referee
    0.11
     REF
    0.11
    Act Density 0.013%

    No Known Activations