INDEX
    Explanations

    phrases that suggest the existence of potential implications or outcomes

    New Auto-Interp
    Negative Logits
     either
    -0.19
     indeed
    -0.18
    either
    -0.18
     both
    -0.18
     både
    -0.18
     first
    -0.16
     Either
    -0.15
    æĹ¢
    -0.15
     accordingly
    -0.15
    both
    -0.15
    POSITIVE LOGITS
     other
    0.26
     also
    0.24
     equally
    0.24
    other
    0.24
    also
    0.22
     another
    0.22
    Also
    0.20
     otras
    0.19
     également
    0.19
     ALSO
    0.19
    Act Density 0.518%

    No Known Activations