INDEX
Explanations
phrases related to potential dangers or risks
phrases related to potential harm or risks associated with actions or events
New Auto-Interp
Negative Logits
iple
-0.63
Latest
-0.59
Cosponsors
-0.58
Joint
-0.55
undrum
-0.54
pioneered
-0.53
Patreon
-0.53
arij
-0.53
ortium
-0.52
displayText
-0.51
POSITIVE LOGITS
)).
0.99
]."
0.87
'."
0.87
)."
0.86
.'"
0.82
".
0.75
).[
0.74
.''.
0.74
?".
0.73
.).
0.72
Activations Density 3.352%