INDEX
Explanations
questions or statements regarding uncertainty or lack of knowledge
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.20
0.9%
1376
+0.12
0.5%
220
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
991
+0.20
0.06
1376
+0.12
0.05
581
+0.10
0.06
Negative Logits
<bos>
-2.67
ⓧ
-0.83
/**
-0.80
/*
-0.74
-0.73
/***
-0.73
#
-0.73
<?
-0.70
///**
-0.67
#![
-0.67
POSITIVE LOGITS
Juf
1.83
ftu
1.74
aen
1.71
thut
1.67
fta
1.65
fays
1.64
fortn
1.64
maneu
1.62
sovere
1.60
ftre
1.60
Activations Density 0.376%