INDEX
Explanations
phrases related to decision-making and risk assessment
references to treatments and their effectiveness or consequences
New Auto-Interp
Negative Logits
ãĤ¼ãĤ¦ãĤ¹
-0.70
ï¸ı
-0.68
ãĥ©ãĥ³
-0.64
DragonMagazine
-0.62
)]
-0.60
gypt
-0.60
iane
-0.57
rious
-0.53
Bowen
-0.52
Sharma
-0.52
POSITIVE LOGITS
at
1.26
altogether
1.21
whatsoever
1.20
simultaneously
1.05
At
1.02
concurrently
0.99
At
0.97
yet
0.85
respectively
0.84
at
0.84
Activations Density 0.770%