INDEX
Explanations
The neuron fires on terms expressing loyal companionship—words like “friend,” “best friend,” “companion,” and related affectionate descriptors.
New Auto-Interp
Negative Logits
choir
-0.07
'It
-0.07
δο
-0.07
denn
-0.06
.file
-0.06
تع
-0.06
.started
-0.06
agas
-0.06
heap
-0.06
Won
-0.06
POSITIVE LOGITS
Pine
0.07
Statistical
0.07
Knife
0.06
linestyle
0.06
substantial
0.06
خطر
0.06
狠
0.06
Whites
0.06
Bryce
0.06
_inode
0.06
Activations Density 0.029%