INDEX
Explanations
the number 42
the repeated occurrence of the number "42" in various contexts
New Auto-Interp
Negative Logits
undai
-0.99
icago
-0.92
kered
-0.87
jriwal
-0.86
itent
-0.82
urus
-0.80
rosse
-0.80
prosec
-0.79
keye
-0.77
temptation
-0.77
POSITIVE LOGITS
APH
0.92
41
0.86
42
0.82
44
0.81
50
0.78
¢
0.76
RD
0.75
00
0.75
43
0.73
73
0.73
Activations Density 0.035%