INDEX
Explanations
repetitions and similarities in concepts and elements across various contexts
New Auto-Interp
Negative Logits
rud
-0.17
amac
-0.15
ichen
-0.15
idunt
-0.15
oked
-0.15
SUCH
-0.15
ivec
-0.15
ัวร
-0.14
obili
-0.14
Aspect
-0.14
POSITIVE LOGITS
as
0.36
как
0.23
als
0.20
quanto
0.18
ÑĩÑĤо
0.17
as
0.16
than
0.16
wie
0.16
jako
0.16
except
0.16
Activations Density 0.097%