INDEX
Explanations
phrases related to permissions, religious beliefs, and genealogy
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.13
0.9%
429
+0.03
0.2%
1302
+0.02
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1671
+0.13
0.07
680
+0.03
0.07
1137
+0.02
0.09
Negative Logits
<bos>
-2.34
/***
-0.97
-0.91
ⓧ
-0.90
<?
-0.78
/**
-0.77
<?
-0.76
public
-0.75
/*
-0.75
///**
-0.72
POSITIVE LOGITS
stockholm
1.78
lidl
1.63
affor
1.62
wien
1.62
maneu
1.61
maroc
1.54
meis
1.50
accla
1.48
aen
1.47
ibiza
1.46
Activations Density 2.404%