INDEX
    Explanations

    sections of academic papers, specifically focusing on introductions and conclusions

    Introduction, conclusion, discussion

    New Auto-Interp
    Negative Logits
     queſta
    -0.77
    styleType
    -0.75
    ſſung
    -0.75
    <unused68>
    -0.73
    <unused17>
    -0.73
    [@BOS@]
    -0.73
    <unused14>
    -0.73
    <unused16>
    -0.73
    <unused8>
    -0.73
    <unused3>
    -0.73
    POSITIVE LOGITS
    Introduction
    0.57
     Introduction
    0.55
    INTRODUCTION
    0.44
     introduction
    0.42
    introduction
    0.40
     introducción
    0.39
    Introdu
    0.37
     INTRODUCTION
    0.37
     Introdu
    0.35
     rosca
    0.34
    Act Density 0.002%

    No Known Activations