INDEX
    Explanations

    references to duality and comparisons between two entities

    New Auto-Interp
    Negative Logits
     all
    -0.19
    242
    -0.18
    onda
    -0.17
    iera
    -0.16
    relationships
    -0.16
     fours
    -0.16
     various
    -0.15
     relationships
    -0.15
    airs
    -0.14
     friendships
    -0.14
    POSITIVE LOGITS
    -two
    0.32
     two
    0.29
    两个
    0.28
     respectively
    0.26
    åĪĨåĪ«
    0.26
    two
    0.26
    Two
    0.25
     beide
    0.25
     obou
    0.25
     beiden
    0.24
    Act Density 0.423%

    No Known Activations