INDEX
Explanations
focus on, ability to
structured, instructional explanations and advice (guide-like, step-by-step or “breakdown” style content typical of assistant responses).
New Auto-Interp
Negative Logits
Alloys
0.40
slump
0.39
სას
0.38
пня
0.38
习
0.37
condemn
0.37
swelling
0.37
tee
0.37
cargas
0.36
unatt
0.36
POSITIVE LOGITS
خدمات
0.45
российских
0.43
ಕಾಣ
0.42
تتم
0.42
נית
0.42
아마
0.42
EDY
0.42
dishwasher
0.42
RUB
0.41
AMENTO
0.41
Activations Density 14.575%