INDEX
Explanations
Depressed and other phrases
instructions that define or emphasize a persona/style or behavior (often in quoted or capitalized form and with roleplay cues).
New Auto-Interp
Negative Logits
based
0.41
significantly
0.41
related
0.40
based
0.39
significant
0.38
use
0.36
using
0.35
comprise
0.35
trong
0.35
specific
0.34
POSITIVE LOGITS
matahari
0.38
प्यार
0.34
coração
0.34
ساعة
0.34
Liverpool
0.33
улы
0.33
kisah
0.33
ﺷ
0.33
жизни
0.33
жизнь
0.33
Activations Density 0.157%