INDEX
Explanations
words related to consequences of actions and potential risks
phrases that express risk and potential increases related to various contexts
New Auto-Interp
Negative Logits
books
-0.76
kef
-0.74
icians
-0.73
zag
-0.72
bl
-0.71
english
-0.69
mates
-0.69
¾
-0.67
ĸļ
-0.67
cients
-0.66
POSITIVE LOGITS
exponentially
0.93
awareness
0.89
intensity
0.85
likelihood
0.82
elevation
0.82
efficiency
0.81
availability
0.80
Capacity
0.79
capacity
0.79
visibility
0.79
Activations Density 0.143%