INDEX
    Explanations

    references to specific individuals or names in the text

    New Auto-Interp
    Negative Logits
    rix
    -0.17
    ustin
    -0.16
    ictions
    -0.15
    geç
    -0.15
    igon
    -0.15
     hierarchy
    -0.15
    625
    -0.15
    uns
    -0.15
    Äħd
    -0.15
    plain
    -0.14
    POSITIVE LOGITS
    ashtra
    0.23
    agh
    0.22
    ajs
    0.21
    itur
    0.21
    ishi
    0.20
    ames
    0.20
    angan
    0.19
    aja
    0.19
    AMES
    0.18
    atan
    0.17
    Act Density 0.036%

    No Known Activations