INDEX
    Explanations

    quotations within texts

    dialogue or quotes in the text

    New Auto-Interp
    Negative Logits
     describ
    -0.80
     looph
    -0.79
     rul
    -0.78
     favour
    -0.78
     edged
    -0.78
     moder
    -0.77
     vegetarian
    -0.76
     spill
    -0.75
    ¥ŀ
    -0.75
     discont
    -0.74
    POSITIVE LOGITS
    Everybody
    1.87
    We
    1.83
    They
    1.81
    Honestly
    1.81
    It
    1.78
    Obviously
    1.76
    Absolutely
    1.75
    I
    1.74
    Yeah
    1.74
    Everything
    1.72
    Act Density 0.153%

    No Known Activations