INDEX
Explanations
Interestingly, Neuron 4 appears to be looking for the word "Dis" which seems to be a prefix indicating some kind of negative connotation or disruption
words or phrases that denote various forms or disciplines of behavior or compliance
New Auto-Interp
Negative Logits
hetti
-0.70
glers
-0.68
OPLE
-0.67
Kinnikuman
-0.64
imately
-0.63
Scotia
-0.62
Juliet
-0.62
Metatron
-0.62
Osw
-0.61
Hanna
-0.61
POSITIVE LOGITS
cipl
1.21
claimer
1.19
ruption
1.19
qus
1.14
comfort
1.14
rup
1.14
cerning
1.12
enfranch
1.11
ciples
1.11
burse
1.10
Activations Density 0.020%