INDEX
Explanations
understanding certain languages
New Auto-Interp
Negative Logits
almost
-0.13
Almost
-0.11
Almost
-0.11
both
-0.11
Various
-0.10
mostly
-0.10
rather
-0.09
både
-0.09
igar
-0.09
casi
-0.09
POSITIVE LOGITS
certain
0.40
certains
0.29
Certain
0.29
Certain
0.27
æŁIJ
0.24
bestimm
0.22
ertain
0.21
some
0.19
æľīäºĽ
0.18
older
0.17
Activations Density 0.100%