INDEX
Explanations
phrases related to positions or rankings in a competitive context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.18
1.0%
889
+0.15
0.8%
680
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
889
+0.18
0.05
1065
+0.15
0.04
1604
+0.13
0.03
Negative Logits
<bos>
-3.05
ⓧ
-0.83
/***
-0.80
/**
-0.76
/*
-0.66
-0.64
Autoritní
-0.62
introduce
-0.60
Kontrola
-0.59
Хро
-0.59
POSITIVE LOGITS
stockholm
1.65
Minang
1.64
maneu
1.61
affor
1.57
disagre
1.55
lidl
1.54
emphat
1.53
impra
1.51
Juf
1.50
accla
1.50
Activations Density 0.091%