INDEX
Explanations
exclamation points used for emphasis or excitement
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.19
1.2%
2034
+0.06
0.4%
605
+0.06
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
605
+0.19
0.07
109
+0.06
0.09
679
+0.06
0.09
Negative Logits
<bos>
-1.61
<?
-1.04
ⓧ
-1.04
-1.01
<?
-1.00
/***
-0.94
/**
-0.94
/*
-0.85
/*++
-0.81
///**
-0.80
POSITIVE LOGITS
maneu
1.73
affor
1.58
accla
1.52
philanth
1.46
increa
1.45
véhic
1.43
shenan
1.41
impra
1.41
disagre
1.40
reluct
1.39
Activations Density 0.274%