INDEX
    Explanations

    names of specific individuals

    New Auto-Interp
    Negative Logits
    Flere
    -0.84
     Kruse
    -0.70
    elegante
    -0.65
    Hvordan
    -0.65
    Hvem
    -0.65
     Hermans
    -0.62
     Schreiber
    -0.58
    Hvorfor
    -0.57
    Hvor
    -0.57
     Schröder
    -0.56
    POSITIVE LOGITS
     stopp
    0.90
     paff
    0.89
     udd
    0.81
     bandung
    0.81
     noss
    0.78
     tass
    0.78
     obb
    0.78
     Krzysz
    0.78
     fupp
    0.77
     milano
    0.77
    Act Density 0.411%

    No Known Activations