INDEX
    Explanations

    names starting with Bert or Burt

    New Auto-Interp
    Negative Logits
     Pless
    -0.74
    braio
    -0.72
     ours
    -0.70
     shouting
    -0.69
     benötigen
    -0.69
     coherent
    -0.69
    omer
    -0.68
    家用
    -0.67
     screaming
    -0.67
     françaises
    -0.66
    POSITIVE LOGITS
    ższy
    0.81
     Olivenöl
    0.81
    serving
    0.81
    ALLENG
    0.79
     Serving
    0.79
    が上が
    0.77
    Награды
    0.76
     konci
    0.74
     جز
    0.73
    NIK
    0.73
    Act Density 0.016%

    No Known Activations