INDEX
Explanations
this neuron is looking for instances where something is almost but not entirely fitting or meeting expectations
phrases indicating uncertainty or hesitation
New Auto-Interp
Negative Logits
olan
-0.84
uments
-0.82
selage
-0.73
cius
-0.72
DRAG
-0.67
runtime
-0.65
ERAL
-0.64
ogi
-0.64
lessness
-0.64
rys
-0.62
POSITIVE LOGITS
icable
0.85
bothered
0.73
Enough
0.72
spo
0.72
shy
0.69
Enough
0.69
theless
0.68
spoon
0.67
enough
0.67
reunited
0.65
Activations Density 0.013%