INDEX
Explanations
interpersonal relationships
emotional expressions and gestures in romantic contexts.
The neuron is detecting speaker‐turn labels and character identifiers in the dialogue (e.g. tokens like NAME_1, NAME_2, and header/ID markers).
New Auto-Interp
Negative Logits
trs
-0.07
,j
-0.07
oggi
-0.07
={}-0.07
ВС
-0.07
برخ
-0.06
IVEN
-0.06
人の
-0.06
ốc
-0.06
orage
-0.06
POSITIVE LOGITS
добавить
0.07
.Controls
0.07
----------↵
0.07
ड
0.06
START
0.06
banning
0.06
noci
0.06
вещ
0.06
iropr
0.06
creds
0.06
Activations Density 0.045%