INDEX
Explanations
phrases related to sudden intense events or actions
descriptions of traumatic or violent events
New Auto-Interp
Negative Logits
20439
-0.85
ãĤ´ãĥ³
-0.81
Achieve
-0.72
Inher
-0.71
annually
-0.70
Patreon
-0.69
yrights
-0.69
endeavors
-0.68
Architects
-0.68
ortium
-0.67
POSITIVE LOGITS
screaming
1.02
..."
1.02
panicked
1.00
yelling
0.99
â̦"
0.97
[
0.95
['
0.93
luckily
0.93
,'"
0.90
kinda
0.90
Activations Density 0.451%