INDEX
Explanations
roleplaying prompts or instructions
New Auto-Interp
Negative Logits
remos
0.44
uring
0.43
icher
0.43
uously
0.40
umatic
0.40
unate
0.40
willpower
0.38
hummingbird
0.38
dishonesty
0.38
uen
0.37
POSITIVE LOGITS
inglés
0.47
ľ
0.46
espécies
0.44
bekerja
0.44
crianças
0.43
bactéri
0.42
Employer
0.42
Великобритании
0.42
advert
0.42
ρ
0.42
Activations Density 0.009%