INDEX
    Explanations

    phrases indicating existence or presence of certain entities within a context

    New Auto-Interp
    Negative Logits
     للمعارف
    -0.75
     Neville
    -0.68
    lemb
    -0.65
     kereszt
    -0.64
     piş
    -0.64
     Vertrauen
    -0.64
     Cardona
    -0.63
     houſe
    -0.63
     geloof
    -0.62
    bVar
    -0.61
    POSITIVE LOGITS
     der
    1.09
    Der
    0.97
     Die
    0.95
    Οι
    0.94
     Der
    0.93
    Die
    0.91
     DER
    0.89
     dieser
    0.89
     ihrer
    0.85
     Οι
    0.84
    Act Density 0.027%

    No Known Activations