INDEX
Explanations
positive remarks or compliments within a longer text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
136
+0.12
0.4%
478
+0.11
0.4%
1045
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1140
+0.12
0.05
976
+0.11
0.04
755
+0.11
0.05
Negative Logits
maksi
-0.90
kasa
-0.89
saar
-0.82
alko
-0.79
uhr
-0.76
jaya
-0.74
karton
-0.73
Kartu
-0.72
makro
-0.71
keramik
-0.71
POSITIVE LOGITS
so
0.75
<bos>
0.68
so
0.62
wieś
0.60
SO
0.59
injust
0.52
great
0.51
tanta
0.50
prestig
0.50
SO
0.50
Activations Density 0.176%