INDEX
Explanations
prepositions and conjunctions used to indicate relationships between ideas and actions
New Auto-Interp
Negative Logits
bu
-0.18
inner
-0.15
jl
-0.15
را
-0.15
zc
-0.14
ss
-0.14
angan
-0.14
onga
-0.14
eu
-0.14
bb
-0.14
POSITIVE LOGITS
Hib
0.17
ocab
0.17
createClass
0.16
deaux
0.15
DOI
0.15
ãģĵãģĨ
0.14
NECT
0.14
GLOSS
0.14
edx
0.14
elper
0.14
Activations Density 0.522%