INDEX
Explanations
words related to achievements, acknowledgments, and celebrations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1967
+0.13
0.4%
1705
+0.12
0.4%
1174
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1363
+0.13
0.08
1892
+0.12
0.06
509
+0.10
0.07
Negative Logits
effe
-1.97
wien
-1.91
embra
-1.86
desir
-1.85
fte
-1.80
purcha
-1.79
„,
-1.77
guarante
-1.76
inder
-1.76
pessi
-1.75
POSITIVE LOGITS
kasarigan
0.65
]=="
0.62
于
0.61
wavering
0.61
改为
0.61
]!='
0.60
나는
0.60
]>=
0.60
因为
0.59
ništvo
0.59
Activations Density 0.595%