INDEX
Explanations
mentions of the middle class
New Auto-Interp
Negative Logits
edIn
-0.83
atche
-0.77
Canaver
-0.71
SIGN
-0.68
pedia
-0.67
orthy
-0.66
vernment
-0.65
ulously
-0.65
ESCO
-0.65
icer
-0.64
POSITIVE LOGITS
brow
0.91
uve
0.79
borough
0.76
piece
0.74
middle
0.71
class
0.71
CLASS
0.70
west
0.70
tone
0.70
stad
0.70
Activations Density 0.453%