INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     start
    -1.79
    start
    -1.69
     Start
    -1.61
    Start
    -1.60
     Starting
    -1.56
    Starting
    -1.54
     starting
    -1.53
     starts
    -1.48
    starting
    -1.45
     START
    -1.34
    POSITIVE LOGITS
    IndentedString
    0.65
     ujednoznacz
    0.63
     ageing
    0.63
     to
    0.63
    ling
    0.60
    les
    0.60
     behaving
    0.60
     off
    0.60
    脚注の使い方
    0.59
     practising
    0.56
    Act Density 0.060%

    No Known Activations