INDEX
    Explanations

    parts of language pertaining to categorization and classification

    New Auto-Interp
    Negative Logits
     bershka
    -1.10
     Monfieur
    -1.05
     itſelf
    -1.00
     Theſe
    -1.00
     Shakspeare
    -0.97
     myſelf
    -0.95
     raiſ
    -0.94
     uſed
    -0.94
     Efq
    -0.93
     moschino
    -0.92
    POSITIVE LOGITS
     ber
    0.54
     said
    0.49
    TagHelper
    0.48
    0.46
     em
    0.46
     di
    0.46
     to
    0.45
    0.43
     for
    0.43
     the
    0.42
    Act Density 0.102%

    No Known Activations