INDEX
Explanations
phrases related to instructions or step-by-step guides
instances of the word "there" and phrases indicating existence or presence
New Auto-Interp
Negative Logits
destro
-0.65
chuk
-0.61
aura
-0.57
{{-0.57
Knight
-0.56
AMI
-0.56
è£
-0.55
sylv
-0.55
chnology
-0.54
Eth
-0.54
POSITIVE LOGITS
ISI
0.55
ye
0.52
netflix
0.50
othe
0.49
bra
0.49
rius
0.48
)!
0.47
Palestinians
0.47
Andy
0.47
mods
0.46
Activations Density 0.460%