INDEX
Explanations
asking for or giving recommendations
New Auto-Interp
Negative Logits
承诺
0.40
σημαν
0.39
требований
0.39
必定
0.39
Anforderungen
0.38
promised
0.37
शिकायतों
0.35
決定
0.35
Determine
0.35
anta
0.34
POSITIVE LOGITS
recomend
0.60
wholeheartedly
0.59
你去
0.55
ways
0.54
improvements
0.54
strongly
0.54
fortement
0.53
recommended
0.53
recommend
0.53
recommended
0.52
Activations Density 0.012%