INDEX
Explanations
physical constriction and distress
New Auto-Interp
Negative Logits
smiling
0.91
smiles
0.77
ooky
0.75
influencing
0.74
blinking
0.73
reclining
0.72
waving
0.72
微笑
0.71
influ
0.71
influ
0.71
POSITIVE LOGITS
tight
0.86
tighten
0.83
const
0.83
constricted
0.82
سنگ
0.77
tightness
0.76
Tight
0.75
traitor
0.73
حلقه
0.73
हाई
0.72
Activations Density 0.126%