INDEX
Explanations
terms related to openness and inclusive environments
New Auto-Interp
Negative Logits
Opening
-0.27
Opening
-0.26
opening
-0.24
opening
-0.23
-opening
-0.21
opened
-0.21
opener
-0.20
opened
-0.20
openings
-0.20
opens
-0.19
POSITIVE LOGITS
-ended
0.36
ended
0.31
-air
0.31
ended
0.29
Ended
0.28
Ended
0.28
baar
0.24
-plan
0.23
-source
0.22
sesame
0.22
Activations Density 0.028%