INDEX
    Explanations

    phrases introducing new information or topics

    New Auto-Interp
    Negative Logits
     otherwise
    -0.76
    isation
    -0.71
     od
    -0.64
     morale
    -0.64
     unit
    -0.62
     attempts
    -0.61
     drain
    -0.60
    lling
    -0.60
     tyres
    -0.60
     spiral
    -0.59
    POSITIVE LOGITS
    Here
    3.04
     Here
    2.19
    Below
    2.09
    Let
    1.60
     Below
    1.55
    Now
    1.43
    There
    1.41
    Again
    1.41
    here
    1.37
    Above
    1.36
    Act Density 0.009%

    No Known Activations