INDEX
Explanations
negative words, particularly adjectives describing things as stupid
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.15
0.7%
1520
+0.10
0.5%
1742
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1363
+0.15
0.04
1601
+0.10
0.03
147
+0.10
0.02
Negative Logits
<bos>
-2.72
/***
-0.73
ⓧ
-0.73
-0.72
HasIndex
-0.70
})();
-0.70
/*!
-0.63
define
-0.63
насељу
-0.62
///**
-0.61
POSITIVE LOGITS
tramont
1.55
sappi
1.39
stockholm
1.25
maroc
1.22
milano
1.17
ibiza
1.16
riviera
1.15
Khart
1.15
cristina
1.14
lidl
1.13
Activations Density 0.156%