INDEX
    Explanations

    exact words or phrases that are similar or repeated for emphasis

    New Auto-Interp
    Negative Logits
    ker
    -0.94
    rift
    -0.83
    itiz
    -0.82
    olyn
    -0.80
    kers
    -0.78
    jay
    -0.78
    rug
    -0.74
    roe
    -0.74
    asta
    -0.74
    isson
    -0.73
    POSITIVE LOGITS
    ãĤ¨
    0.89
     opposite
    0.86
     aligned
    0.83
     wrong
    0.83
     matched
    0.75
     suited
    0.75
     positioned
    0.75
    æ©Ł
    0.74
    Els
    0.74
     tuned
    0.71
    Act Density 8.570%

    No Known Activations