INDEX
Explanations
phrases related to effort or dedication
phrases related to effort and time investment
New Auto-Interp
Negative Logits
helicop
-0.78
ittle
-0.74
agre
-0.73
\\\\\\\\
-0.72
idden
-0.72
vertisement
-0.70
livest
-0.69
etheless
-0.69
convol
-0.68
reditary
-0.67
POSITIVE LOGITS
oneself
0.64
to
0.64
ANA
0.64
ï¸ı
0.61
him
0.61
entails
0.60
for
0.59
.''.
0.59
GW
0.58
advantage
0.58
Activations Density 0.042%