INDEX
Explanations
The neuron flags mentions of consuming or experiencing media—tokens like “watch(ed),” “read,” “anime,” “movies,” “books,” “serials,” etc.
New Auto-Interp
Negative Logits
labor
-0.07
.ro
-0.07
メ
-0.07
ra
-0.07
からない
-0.07
pause
-0.07
induction
-0.07
.getAccount
-0.06
parate
-0.06
mad
-0.06
POSITIVE LOGITS
Decl
0.06
iefs
0.06
GHC
0.06
ansen
0.06
/res
0.06
(CONT
0.06
ández
0.06
,and
0.06
věř
0.06
péri
0.06
Activations Density 0.132%