INDEX
    Explanations

    terms related to abstraction and abstract concepts

    New Auto-Interp
    Negative Logits
    ÑĥÑĩа
    -0.17
    лаÑģ
    -0.16
    одо
    -0.16
     -------------------------------------------------------------------------↵
    -0.16
    ern
    -0.15
    shaw
    -0.15
    )prepare
    -0.15
    unta
    -0.15
    ermo
    -0.15
    agra
    -0.14
    POSITIVE LOGITS
    ed
    0.31
    edly
    0.24
    ivism
    0.23
    ified
    0.23
    ively
    0.20
    ivist
    0.19
    -syntax
    0.19
    s
    0.18
    ing
    0.18
    ly
    0.17
    Act Density 0.017%

    No Known Activations