INDEX
Explanations
animal actions and outcomes
New Auto-Interp
Negative Logits
appunto
0.51
Seriously
0.44
Seriously
0.44
なのです
0.43
Donc
0.42
portanto
0.42
म्हणूनच
0.41
właśnie
0.41
именно
0.40
그래서
0.40
POSITIVE LOGITS
hingegen
0.98
ebenfalls
0.92
natomiast
0.91
similarly
0.86
likewise
0.85
同樣
0.80
同样
0.76
dagegen
0.74
kolei
0.73
こちらも
0.73
Activations Density 0.022%