INDEX
Explanations
key terms related to reasoning and justification in various contexts
New Auto-Interp
Negative Logits
Has
-0.17
isko
-0.15
Has
-0.15
zh
-0.14
iso
-0.14
Makes
-0.14
-has
-0.14
ISO
-0.14
HAS
-0.14
Provides
-0.13
POSITIVE LOGITS
is
0.56
çļĦæĺ¯
0.51
adalah
0.40
å°±æĺ¯
0.34
are
0.33
æĺ¯
0.32
ãģ®ãģ¯
0.31
æĺ¯åľ¨
0.31
was
0.30
lÃł
0.30
Activations Density 0.491%