INDEX
    Explanations

    phrases indicating specific types or categories of things

    New Auto-Interp
    Negative Logits
     types
    -0.20
     kinds
    -0.19
    Types
    -0.18
    elsen
    -0.17
     Types
    -0.16
    _types
    -0.16
    -types
    -0.15
     sorts
    -0.15
    uhan
    -0.15
    types
    -0.14
    POSITIVE LOGITS
    thing
    0.24
     thing
    0.23
     behaviour
    0.15
    äºĭæĥħ
    0.15
     warfare
    0.15
     behavior
    0.15
    ëį°
    0.15
     thinking
    0.15
     activity
    0.15
    IDI
    0.15
    Act Density 0.075%

    No Known Activations