INDEX
    Explanations

    words indicative of theoretical discussions or reviews in academic contexts

    starting or beginning

    we start, begin, proceed

    New Auto-Interp
    Negative Logits
    another
    -0.53
     another
    -0.52
     weiteren
    -0.52
     weitere
    -0.51
     Another
    -0.51
    Further
    -0.50
     ytterligare
    -0.50
     autre
    -0.50
     weiterer
    -0.50
    還能
    -0.49
    POSITIVE LOGITS
     starts
    2.39
     start
    2.29
     starting
    2.19
     begin
    2.08
     begins
    2.05
     Start
    2.05
     Starting
    2.04
     Starts
    2.03
     started
    2.01
     dimulai
    1.99
    Act Density 0.469%

    No Known Activations