INDEX
Explanations
punctuation and formatting elements, suggesting a focus on conversational or interactive language
New Auto-Interp
Negative Logits
ingham
-0.16
loth
-0.14
ardi
-0.14
yna
-0.13
gap
-0.13
pper
-0.13
Cooke
-0.13
irá
-0.13
iao
-0.13
adress
-0.13
POSITIVE LOGITS
Episode
0.24
listener
0.23
listeners
0.23
tune
0.22
episode
0.22
Episode
0.22
Segment
0.21
Listener
0.20
Listener
0.20
episode
0.20
Activations Density 0.052%