INDEX
    Explanations

    phrases indicating initial perceptions or actions

    New Auto-Interp
    Negative Logits
     Chwiliwch
    -0.71
    Hentet
    -0.70
    Życiorys
    -0.66
    Tembelea
    -0.66
    AddTagHelper
    -0.64
     nahilalakip
    -0.63
    MessageTagHelper
    -0.62
     ostavi
    -0.62
     kasarigan
    -0.60
    RTSC
    -0.60
    POSITIVE LOGITS
     Initially
    0.61
    Initially
    0.57
    最初は
    0.47
     Sometimes
    0.43
     pikir
    0.41
     awalnya
    0.40
    Sometimes
    0.39
     initially
    0.39
     Monfieur
    0.38
     Usually
    0.36
    Act Density 0.015%

    No Known Activations