INDEX
    Explanations

    phrases related to motivations, decisions, and achievements

    special characters or unusual formatting in the text

    New Auto-Interp
    Negative Logits
     scatter
    -0.59
     decomp
    -0.52
     cyan
    -0.51
     scattering
    -0.51
     buggy
    -0.51
     shack
    -0.50
     bed
    -0.49
     Nib
    -0.48
     radar
    -0.47
     coast
    -0.47
    POSITIVE LOGITS
    ¹
    0.85
    £
    0.83
    ı
    0.79
    º
    0.79
    ¡
    0.78
    Ĵ
    0.78
    ł
    0.77
    į
    0.76
    ¬
    0.72
    §
    0.72
    Act Density 0.493%

    No Known Activations