INDEX
Explanations
repetitive phrases or terms that emphasize recurrence
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
129
+0.12
0.7%
313
+0.11
0.6%
163
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
239
+0.12
0.09
87
+0.11
0.08
249
+0.11
0.07
Negative Logits
ı
-2.06
Ĭ
-2.04
ķ
-2.02
Ĵ
-2.01
¸
-1.93
Ļ
-1.86
·
-1.84
Ļª
-1.80
ÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤ
-1.78
¯
-1.76
POSITIVE LOGITS
version
1.60
blogspot
1.58
themselves
1.57
NOTICE
1.48
fire
1.48
adoc
1.46
keeper
1.46
ness
1.44
wise
1.43
ical
1.40
Activations Density 0.159%