INDEX
Explanations
belief, expectation, and knowledge
New Auto-Interp
Negative Logits
في
0.39
I
0.39
in
0.38
在
0.38
и
0.38
ﻟ
0.37
G
0.37
,
0.36
да
0.36
،
0.35
POSITIVE LOGITS
been
0.49
ä
0.48
fertilisers
0.44
flavours
0.40
dL
0.39
licences
0.39
র
0.39
बड़े
0.39
る
0.38
ارى
0.38
Activations Density 0.593%