INDEX
Explanations
The neuron is detecting occurrences of the word “moths.”
New Auto-Interp
Negative Logits
outlet
-0.06
flooded
-0.06
okers
-0.06
đình
-0.06
android
-0.06
idine
-0.06
ój
-0.06
ucken
-0.06
طفال
-0.06
randint
-0.06
POSITIVE LOGITS
irresponsible
0.07
سع
0.07
audits
0.07
canoe
0.06
siti
0.06
Степ
0.06
Zoom
0.06
Drag
0.06
меш
0.06
Atmos
0.06
Activations Density 0.004%