INDEX
Explanations
occurrences of the word "other"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.15
0.8%
1984
+0.07
0.4%
1265
+0.06
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1910
+0.15
0.08
1984
+0.07
0.08
1504
+0.06
0.07
Negative Logits
<bos>
-2.35
-0.95
/*
-0.90
<?
-0.88
ⓧ
-0.87
/**
-0.84
/***
-0.82
lateinit
-0.77
#
-0.73
ɵɵ
-0.72
POSITIVE LOGITS
affor
2.20
maneu
2.19
accla
2.00
impra
1.98
increa
1.96
disagre
1.91
Intere
1.91
scrat
1.90
ibiza
1.88
excru
1.87
Activations Density 0.163%