INDEX
Explanations
instances of analogies and metaphors used for explanations
New Auto-Interp
Negative Logits
iw
-0.15
ело
-0.14
ÙĤاء
-0.14
å¼ĥ
-0.14
olang
-0.14
angered
-0.14
ONGO
-0.13
wrap
-0.13
à¥įसर
-0.13
عز
-0.13
POSITIVE LOGITS
example
0.16
ubi
0.15
uki
0.15
ahi
0.15
Ridley
0.14
ä¾ĭ
0.14
analogy
0.14
bidi
0.14
permalink
0.14
929
0.14
Activations Density 0.179%