INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    InjectAttribute
    -0.62
    phones
    -0.60
     AssemblyCulture
    -0.60
     berätt
    -0.59
    enheim
    -0.57
     confé
    -0.56
    âteau
    -0.56
    NEZ
    -0.56
    <?
    -0.55
     varandra
    -0.53
    POSITIVE LOGITS
    ized
    0.81
    ity
    0.69
    ization
    0.66
    ised
    0.66
    ities
    0.64
    izing
    0.64
    ation
    0.60
    ated
    0.60
    ists
    0.57
    ally
    0.56
    Act Density 0.184%

    No Known Activations