INDEX
Explanations
feel good, connected, comfortable
New Auto-Interp
Negative Logits
Feeling
0.72
असह
0.72
amiss
0.72
onset
0.72
feeling
0.72
discomfort
0.70
Direction
0.68
ಅಪ
0.68
覺得
0.66
overwhelm
0.66
POSITIVE LOGITS
Control
0.74
年轻
0.73
ஆண்டுக
0.73
主办
0.73
एसआय
0.69
CONTROL
0.69
álního
0.69
possible
0.69
妹
0.68
mögliche
0.67
Activations Density 0.033%