INDEX
Explanations
personal reactions to others
New Auto-Interp
Negative Logits
$(\
0.45
ISING
0.42
DISC
0.40
$(\
0.40
CHAP
0.39
にあった
0.38
Fluid
0.38
ставить
0.38
$\{\0.37
Plasma
0.37
POSITIVE LOGITS
speechless
0.70
intrigued
0.68
rethink
0.66
shudder
0.61
uncomfortable
0.59
uneasy
0.58
berpikir
0.57
感到
0.57
ashamed
0.57
breathless
0.57
Activations Density 0.021%