INDEX
Explanations
words related to theatrical concepts or performance
New Auto-Interp
Negative Logits
es
-0.25
hole
-0.24
halt
-0.24
ho
-0.24
hb
-0.23
t
-0.23
hora
-0.22
h
-0.22
hum
-0.22
hoff
-0.22
POSITIVE LOGITS
ting
0.31
tempt
0.29
rices
0.26
rice
0.26
tempts
0.25
ernal
0.25
te
0.25
uration
0.23
ransition
0.23
ronic
0.23
Activations Density 0.084%