INDEX
Explanations
references to musicals and performances, particularly those involving female characters
New Auto-Interp
Negative Logits
rov
-0.16
bote
-0.15
çIJĨ
-0.14
circus
-0.14
owie
-0.14
Circus
-0.14
cir
-0.14
seed
-0.14
Cir
-0.14
aris
-0.14
POSITIVE LOGITS
Sharp
0.24
Sharp
0.24
Troy
0.21
sharp
0.17
Wildcats
0.17
principal
0.16
East
0.16
Nation
0.16
exert
0.16
du
0.16
Activations Density 0.005%