INDEX
Explanations
comparisons between fictional characters or entities in various scenarios
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.31
1.2%
184
+0.18
0.7%
1356
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.31
0.04
184
+0.18
0.02
1356
+0.13
0.02
Negative Logits
onore
-0.75
Viitteet
-0.73
virtù
-0.70
<bos>
-0.66
lusso
-0.61
تقاوى
-0.60
pét
-0.60
affitto
-0.60
виправивши
-0.57
onor
-0.57
POSITIVE LOGITS
FTFY
0.60
useAuth
0.56
:"-
0.49
SneakyThrows
0.49
Ehh
0.48
Ikr
0.48
חיצוני
0.48
Lmao
0.46
Wtf
0.46
impelled
0.46
Activations Density 0.172%