INDEX
Explanations
In this case, the neuron appears to be looking for words related to criticism or negativity
New Auto-Interp
Negative Logits
agher
-1.16
reon
-1.01
ulhu
-1.00
orate
-0.98
Ancients
-0.95
arians
-0.95
inosaur
-0.95
ICAN
-0.94
orians
-0.94
Mant
-0.94
POSITIVE LOGITS
ball
1.58
ening
1.47
hearted
1.41
ener
1.33
grass
1.32
eners
1.25
cover
1.25
palate
1.23
heart
1.21
balls
1.16
Activations Density 0.873%