INDEX
Explanations
This neuron is looking for words related to imperfection or flaws
terms related to permanence and inevitability
New Auto-Interp
Negative Logits
anwhile
-0.85
wagen
-0.80
guiActiveUnfocused
-0.70
CDs
-0.68
âĸ¬
-0.67
GOODMAN
-0.66
hops
-0.65
hare
-0.65
WAYS
-0.64
ãĥ¯ãĥ³
-0.64
POSITIVE LOGITS
vious
1.13
ishable
1.13
missible
1.10
manent
1.09
iled
0.89
mented
0.89
bably
0.89
pex
0.87
redict
0.86
igen
0.85
Activations Density 0.012%