INDEX
Explanations
phrases related to the actions of simulating gems
opening parentheses
New Auto-Interp
Negative Logits
sis
-0.80
Hol
-0.75
ac
-0.73
inn
-0.72
oys
-0.70
ican
-0.70
oot
-0.70
isen
-0.68
ophobia
-0.67
ador
-0.67
POSITIVE LOGITS
uncond
0.88
clud
0.70
locals
0.69
Palestin
0.68
oun
0.68
enhagen
0.67
sidew
0.67
neighb
0.66
abouts
0.65
crossings
0.65
Activations Density 0.000%