INDEX
Explanations
discussions related to political and legislative struggles, specifically regarding changes in laws and policies
words related to complex or technical terminology, especially involving policy, legislation, and administrative language.
word fragments that are broken by hyphens, particularly syllable breaks in words.
Subword tokenization patterns where words are segmented into smaller linguistic units by a tokenizer, visible as partial syllables and morphemes across diverse text contexts.
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
876
+0.11
0.3%
1499
+0.10
0.3%
674
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
392
+0.11
0.01
1401
+0.10
0.02
1032
+0.09
0.02
Negative Logits
Czym
-0.69
Eksteraj
-0.67
Ekster
-0.64
Răsp
-0.64
Literat
-0.64
Dicas
-0.61
Și
-0.61
Weiterlesen
-0.60
Kleur
-0.60
Embal
-0.60
POSITIVE LOGITS
dises
1.31
<bos>
1.31
dispen
1.27
erec
1.26
effe
1.24
seiz
1.20
nece
1.19
haer
1.19
haup
1.19
tamen
1.19
Activations Density 0.054%