INDEX
    Explanations

    proper nouns, particularly names and specific entities

    New Auto-Interp
    Negative Logits
    featureID
    -0.56
     trapping
    -0.55
    catching
    -0.48
    hobby
    -0.47
    expandindo
    -0.47
    mbic
    -0.45
    mixing
    -0.45
     autorytatywna
    -0.45
    GTCX
    -0.45
    fixing
    -0.44
    POSITIVE LOGITS
     nahilalakip
    0.44
     dotyczą
    0.41
     completas
    0.40
     acepta
    0.39
     simplemente
    0.39
     uniwers
    0.39
     recue
    0.38
    Eloquent
    0.37
    īdz
    0.37
     Lengkap
    0.37
    Act Density 0.138%

    No Known Activations