INDEX
Explanations
references to honor and recognition
New Auto-Interp
Negative Logits
favor
-0.80
color
-0.76
labor
-0.76
ее
-0.74
honor
-0.72
center
-0.69
Ее
-0.68
favors
-0.65
"
-0.65
-0.64
POSITIVE LOGITS
neighbourhoods
1.79
colour
1.78
Colour
1.75
humour
1.74
colours
1.74
COLOUR
1.74
neighbourhood
1.72
honour
1.71
Honour
1.69
tumour
1.69
Activations Density 0.110%