INDEX
Explanations
phrases that describe concepts
New Auto-Interp
Negative Logits
撗
0.23
所
0.23
ഇത്തരം
0.22
whom
0.22
lakini
0.22
nhưng
0.22
яку
0.22
wobei
0.22
പറയുന്നു
0.21
परंतु
0.21
POSITIVE LOGITS
that
0.47
thats
0.36
solely
0.31
specifically
0.30
designed
0.30
that
0.30
explicitly
0.30
purely
0.29
thay
0.28
deemed
0.28
Activations Density 0.554%