INDEX
Explanations
bragging
The neuron detects boastful or self-aggrandizing language (words expressing bragging or pride).
New Auto-Interp
Negative Logits
authors
-0.06
.path
-0.06
futuro
-0.06
kişisel
-0.06
competing
-0.06
bottled
-0.06
Από
-0.06
params
-0.06
hypers
-0.06
ayrı
-0.06
POSITIVE LOGITS
Vanguard
0.07
Kare
0.07
alyze
0.07
FixedSize
0.06
바이
0.06
Senator
0.06
avy
0.06
分析
0.06
.Report
0.06
podařilo
0.06
Activations Density 0.179%