INDEX
Explanations
acknowledging feelings and offering support
New Auto-Interp
Negative Logits
Psik
0.44
psicologia
0.43
amused
0.42
Caution
0.40
楽しみ
0.40
mischiev
0.40
geworden
0.39
变为
0.39
Caution
0.38
Psychological
0.38
POSITIVE LOGITS
feelings
0.62
Whatever
0.60
whatever
0.57
bottling
0.56
Whatever
0.54
acknowledging
0.54
remember
0.53
whatever
0.52
acknowledge
0.52
Feelings
0.52
Activations Density 0.089%