INDEX
    Explanations

    references to literary figures or works

    New Auto-Interp
    Negative Logits
    thouse
    -0.17
    anoia
    -0.16
    tek
    -0.16
    gage
    -0.15
    onium
    -0.14
    949
    -0.14
     UCLA
    -0.14
    iglia
    -0.14
    strar
    -0.13
    uttle
    -0.13
    POSITIVE LOGITS
     Hem
    0.35
    hem
    0.25
     hem
    0.20
     bull
    0.18
     Ernest
    0.18
     Stein
    0.18
    çĢ
    0.17
     Cub
    0.16
     suck
    0.16
    pic
    0.15
    Act Density 0.005%

    No Known Activations