INDEX
Explanations
how someone behaves or responds
New Auto-Interp
Negative Logits
получаем
0.40
enders
0.39
割引
0.39
получать
0.39
получили
0.38
キ
0.38
그거
0.38
찾
0.38
받았
0.38
शर्म
0.37
POSITIVE LOGITS
whom
0.75
whom
0.68
responds
0.57
behaves
0.57
respond
0.55
してくれる
0.55
reciproc
0.54
behaving
0.54
behaved
0.52
behave
0.51
Activations Density 0.022%