INDEX
Explanations
conjunctions and connectors that indicate relationships or contrasts in ideas
New Auto-Interp
Negative Logits
there
-0.15
alike
-0.14
nton
-0.14
they
-0.14
we
-0.14
urn
-0.13
assis
-0.13
нÑı
-0.12
it
-0.12
.Null
-0.12
POSITIVE LOGITS
whose
0.70
whose
0.58
which
0.56
which
0.47
whom
0.42
коÑĤоÑĢÑĭе
0.40
who
0.39
cui
0.38
коÑĤоÑĢÑĭй
0.38
który
0.37
Activations Density 0.185%