INDEX
Explanations
mentions of specific episodes in a TV series
references to specific episodes in a series
New Auto-Interp
Negative Logits
forts
-0.78
lying
-0.70
paces
-0.64
CONS
-0.63
minist
-0.62
isse
-0.62
helm
-0.62
ror
-0.61
bos
-0.61
RO
-0.60
POSITIVE LOGITS
episode
3.63
episode
2.61
episodes
2.58
Episode
2.35
Episode
2.15
isode
1.66
isodes
1.62
chapter
1.45
installment
1.33
podcast
1.33
Activations Density 0.014%