INDEX
    Explanations

    references to specific films and cultural phenomena

    New Auto-Interp
    Negative Logits
    /
    -0.40
     «
    -0.39
     David
    -0.39
     A
    -0.39
     or
    -0.37
     “
    -0.36
     as
    -0.35
     rí
    -0.35
     recherche
    -0.34
     "
    -0.34
    POSITIVE LOGITS
     myſelf
    1.12
    MemoryWarning
    1.12
     themſelves
    1.03
     ſtate
    0.98
     ſta
    0.98
     Shakspeare
    0.97
    featureID
    0.96
     himſelf
    0.96
     ſche
    0.95
    ſelf
    0.93
    Act Density 0.257%

    No Known Activations