INDEX
Explanations
distractions
The neuron detects mentions of distracting or diversion actions (e.g., “distracts,” “distraction”).
New Auto-Interp
Negative Logits
.options
-0.07
contemplating
-0.07
Hamilton
-0.07
graffiti
-0.07
express
-0.06
address
-0.06
Steering
-0.06
Deutsche
-0.06
pioneer
-0.06
entails
-0.06
POSITIVE LOGITS
.Ed
0.07
происходит
0.06
foi
0.06
มกราคม
0.06
。不
0.06
.Fat
0.06
�
0.06
ilmek
0.06
Decompiled
0.06
اورزی
0.06
Activations Density 0.065%