INDEX
Explanations
adjectives and verbs related to challenges, difficulties, and limitations
references to obstacles or challenges in a societal or technological context
New Auto-Interp
Negative Logits
abad
-0.67
enhagen
-0.65
icipated
-0.63
deen
-0.62
ichita
-0.59
-0.57
oway
-0.56
elcome
-0.56
Yad
-0.55
ilitary
-0.54
POSITIVE LOGITS
ours
0.79
inefficient
0.75
detriment
0.73
innovate
0.72
harm
0.72
ourselves
0.71
destructive
0.71
outweigh
0.69
destruct
0.68
trivial
0.68
Activations Density 0.985%