INDEX
Explanations
phrases related to food preferences and health concerns
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.18
1.0%
1896
+0.07
0.4%
545
+0.06
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
545
+0.18
0.09
1896
+0.07
0.07
2034
+0.06
0.08
Negative Logits
<bos>
-1.93
-1.05
ⓧ
-1.01
/**
-0.96
/***
-0.91
<?
-0.88
/*
-0.77
/*!
-0.75
<?
-0.74
///**
-0.71
POSITIVE LOGITS
maroc
1.18
maneu
1.17
véhic
1.14
accla
1.07
affor
1.06
stockholm
1.04
Juf
1.04
lele
1.02
catég
1.02
embodi
1.00
Activations Density 0.206%