INDEX
    Explanations

    Nor or Nora followed by specific words

    New Auto-Interp
    Negative Logits
     Марина
    0.44
    રાજ
    0.40
    MOTION
    0.39
     жер
    0.39
     Snapshot
    0.38
    做什么
    0.38
     jato
    0.38
    കോ
    0.38
     জিয়া
    0.37
    нету
    0.37
    POSITIVE LOGITS
    wegian
    0.70
     нор
    0.64
     Нор
    0.59
    therners
    0.58
     Norfolk
    0.57
     NOR
    0.55
     Norwegian
    0.54
     Nor
    0.53
    ノル
    0.53
     Norweg
    0.53
    Act Density 0.006%

    No Known Activations