INDEX
Explanations
phrases associated with emergency situations or urgency
actions related to danger or distress in a narrative context
New Auto-Interp
Negative Logits
Decay
-0.59
depends
-0.58
partName
-0.56
issu
-0.54
bnb
-0.52
Story
-0.51
arij
-0.51
undrum
-0.51
.--
-0.50
consists
-0.50
POSITIVE LOGITS
)).
0.61
".
0.57
hers
0.56
undet
0.52
".
0.48
instead
0.47
).[
0.47
his
0.47
".[
0.46
nearby
0.46
Activations Density 4.785%