INDEX
Explanations
references to social hierarchy and class distinctions
New Auto-Interp
Negative Logits
bats
-0.86
DAY
-0.78
urbed
-0.73
risome
-0.71
GUI
-0.71
kees
-0.70
Merit
-0.69
thirst
-0.67
indust
-0.67
natureconservancy
-0.65
POSITIVE LOGITS
endment
0.81
imester
0.74
KO
0.70
onwards
0.68
Misty
0.67
enium
0.65
KO
0.64
foremost
0.63
approximation
0.63
eve
0.62
Activations Density 0.082%