INDEX
Explanations
adjectives and compound words
New Auto-Interp
Negative Logits
ᚋ
0.22
Benzyloxy
0.21
SNR
0.21
Reminder
0.20
اش
0.20
Entities
0.20
Administration
0.20
Gosudarstvennyj
0.20
开始
0.20
鐏
0.19
POSITIVE LOGITS
rejected
0.24
deportation
0.22
},
0.22
player
0.22
wrongful
0.21
rejecting
0.21
deport
0.21
wrongly
0.21
}
0.21
rejection
0.21
Activations Density 0.002%