INDEX
Explanations
keywords related to trains and transportation infrastructure
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
411
+0.14
0.5%
479
+0.13
0.5%
1961
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
479
+0.14
0.03
544
+0.13
0.03
1806
+0.12
0.02
Negative Logits
nicolas
-0.77
roberto
-0.74
hek
-0.70
alberto
-0.68
gabri
-0.67
fortn
-0.66
lara
-0.65
kaos
-0.64
purcha
-0.64
sergio
-0.63
POSITIVE LOGITS
train
1.52
train
1.39
Train
1.34
trains
1.32
Train
1.32
TRAIN
1.14
Trains
1.10
TRAIN
1.09
Trains
1.06
trains
1.05
Activations Density 0.051%