INDEX
Explanations
political and economic terms or contexts
New Auto-Interp
Negative Logits
Downloadha
-0.58
allegedly
-0.45
adh
-0.44
dared
-0.44
supposedly
-0.43
guiActiveUn
-0.43
ruining
-0.43
Paste
-0.42
>>\
-0.41
recently
-0.41
POSITIVE LOGITS
oneself
0.73
ourselves
0.72
cipled
0.60
anew
0.57
yourself
0.54
yourselves
0.54
sacrific
0.53
meaningful
0.52
ardless
0.51
humility
0.49
Activations Density 19.095%