INDEX
    Explanations

    gerunds and words related to roles and actions in a structured context

    New Auto-Interp
    Negative Logits
    ings
    -0.23
    guns
    -0.18
    ality
    -0.16
    lt
    -0.16
    ng
    -0.15
    zb
    -0.15
    xs
    -0.15
    zh
    -0.15
    oul
    -0.15
    iw
    -0.15
    POSITIVE LOGITS
     factor
    0.26
     factors
    0.25
    redient
    0.23
    redients
    0.22
    factor
    0.22
     Factors
    0.21
    -factor
    0.20
     Factor
    0.19
    _FACTOR
    0.18
     force
    0.18
    Act Density 0.142%

    No Known Activations