INDEX
    Explanations

    references to personal feelings and interactions

    New Auto-Interp
    Negative Logits
     mourut
    -0.83
     geweest
    -0.78
    Enjoyed
    -0.74
     belonged
    -0.68
    Were
    -0.67
    进行了
    -0.66
     consisted
    -0.66
     existed
    -0.65
     wasnt
    -0.65
    提供了
    -0.65
    POSITIVE LOGITS
     make
    1.22
     take
    1.21
     get
    1.14
     decide
    1.04
     come
    1.04
     give
    1.00
     go
    0.99
     try
    0.99
     bring
    0.97
     create
    0.95
    Act Density 0.462%

    No Known Activations