INDEX
Explanations
mentions of specific or definitive characteristics and details within a text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.21
1.2%
1339
+0.14
0.8%
893
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1339
+0.21
0.04
893
+0.14
0.04
680
+0.12
0.03
Negative Logits
<bos>
-3.43
/***
-0.84
///**
-0.74
protected
-0.67
public
-0.67
ⓧ
-0.65
-0.63
//{
-0.61
/*!
-0.61
assistir
-0.59
POSITIVE LOGITS
lidl
1.30
lele
1.27
milano
1.25
maroc
1.24
tramont
1.24
bandung
1.23
toledo
1.21
meis
1.21
wien
1.21
maneu
1.20
Activations Density 0.074%