INDEX
Explanations
punctuation
This neuron detects words and phrases used for disclaiming, denying, or clarifying (e.g., “no,” “denied,” “did nothing,” “clarifies”).
New Auto-Interp
Negative Logits
З
-0.07
legs
-0.07
break
-0.07
wd
-0.07
cookie
-0.06
Effects
-0.06
Hud
-0.06
89
-0.06
password
-0.06
RG
-0.06
POSITIVE LOGITS
_IOC
0.07
θέση
0.07
�
0.07
.initializeApp
0.07
ballo
0.06
raz
0.06
aVar
0.06
zač
0.06
){
↵0.06
?>><?
0.06
Activations Density 0.041%