INDEX
    Explanations

    presenting research actions

    New Auto-Interp
    Negative Logits
    どんどん
    0.43
     magari
    0.43
    がたくさん
    0.41
     telling
    0.40
     выход
    0.40
     forcément
    0.39
     vraiment
    0.39
     Кстати
    0.39
    Basically
    0.39
     obviamente
    0.39
    POSITIVE LOGITS
     demonstrate
    0.79
     demonstrated
    0.73
     discuss
    0.71
     presented
    0.67
     demonstrates
    0.67
     Demonstrate
    0.63
    discuss
    0.61
     propose
    0.61
     investigate
    0.59
     discusses
    0.59
    Act Density 0.010%

    No Known Activations