INDEX
Explanations
phrases related to the concept of distance or location
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1967
+0.11
0.3%
1741
+0.11
0.3%
1385
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
939
+0.11
0.04
1967
+0.11
0.04
1804
+0.11
0.04
Negative Logits
🥲
-0.58
😭😭
-0.57
🤣🤣
-0.56
cirque
-0.55
ypeł
-0.55
🥲
-0.55
😌
-0.54
❤️❤️
-0.53
😌
-0.53
amanda
-0.53
POSITIVE LOGITS
tamen
0.55
MÁ
0.54
tabu
0.54
Okt
0.53
Gorb
0.50
geograf
0.49
antik
0.49
polig
0.48
monaster
0.48
Schrö
0.48
Activations Density 0.208%