INDEX
Explanations
Haven't been able to detect a clear pattern in the provided activations for neuron 4 - further analysis might be needed
the word "wouldn't" and its variations, indicating skepticism or hypothetical scenarios
New Auto-Interp
Negative Logits
ULT
-0.70
Proced
-0.66
Gutenberg
-0.58
PI
-0.57
gaard
-0.57
dimensional
-0.57
Offline
-0.57
Casting
-0.56
Learning
-0.56
Butt
-0.56
POSITIVE LOGITS
't
1.29
geon
0.97
atically
0.85
terness
0.85
geons
0.82
agy
0.82
acies
0.80
¹
0.79
etsk
0.78
ģĸ
0.77
Activations Density 0.017%