INDEX
Explanations
instances where the concept of inactivity or lack of action is mentioned
references to the concept of inaction or doing nothing
New Auto-Interp
Negative Logits
aten
-0.81
ipel
-0.66
Sect
-0.62
roups
-0.62
ardi
-0.61
aturated
-0.60
Pand
-0.60
Hacker
-0.59
Fog
-0.58
passages
-0.58
POSITIVE LOGITS
wrong
1.07
proactive
0.99
else
0.97
differently
0.93
wrong
0.91
drastic
0.86
remotely
0.83
meaningful
0.82
unethical
0.82
stupid
0.81
Activations Density 0.048%