INDEX
Explanations
spending time on activities
New Auto-Interp
Negative Logits
semangat
0.78
memerlukan
0.76
Through
0.73
Through
0.72
käyttö
0.72
zmienia
0.71
हानि
0.71
принимает
0.71
doloribus
0.70
wszyst
0.70
POSITIVE LOGITS
lobbying
0.95
researching
0.93
arguing
0.89
研发
0.88
searching
0.87
frivol
0.87
explaining
0.86
lobbyists
0.85
debating
0.84
devoted
0.82
Activations Density 0.087%