INDEX
Explanations
mentions of "middle-aged" individuals
New Auto-Interp
Negative Logits
edIn
-0.79
atche
-0.77
tics
-0.71
Canaver
-0.69
SIGN
-0.68
orthy
-0.68
pedia
-0.67
cci
-0.67
vernment
-0.66
issance
-0.66
POSITIVE LOGITS
brow
0.91
piece
0.84
uve
0.82
finger
0.77
class
0.75
weights
0.75
school
0.74
pace
0.74
layer
0.73
weight
0.72
Activations Density 0.030%