INDEX
Explanations
This neuron detects mentions of the phrase “locker room.”
New Auto-Interp
Negative Logits
Philly
-0.07
無
-0.07
_split
-0.06
átu
-0.06
orno
-0.06
Πο
-0.06
gó
-0.06
:text
-0.06
tooltips
-0.06
v
-0.06
POSITIVE LOGITS
classic
0.08
contextual
0.07
.XtraPrinting
0.07
dva
0.06
Casino
0.06
//{↵0.06
charisma
0.06
harma
0.06
parser
0.06
Зак
0.06
Activations Density 0.002%