INDEX
    Explanations

    terms related to political and economic structures or resources

    New Auto-Interp
    Negative Logits
    OTA
    -0.16
     Fir
    -0.16
    wy
    -0.15
    еÑĢÑĸ
    -0.15
    _capabilities
    -0.14
    ÑijÑĢ
    -0.14
    λÏī
    -0.14
    .Misc
    -0.14
    .misc
    -0.14
    _BINARY
    -0.14
    POSITIVE LOGITS
    abee
    0.17
    uche
    0.17
    923
    0.17
     Giles
    0.15
    Ä±ÅŁÄ±
    0.15
    689
    0.14
    ies
    0.14
     Model
    0.14
    885
    0.14
    olk
    0.14
    Act Density 0.008%

    No Known Activations