INDEX
Explanations
phrases referring to consequences and effects related to various societal issues
New Auto-Interp
Negative Logits
ãĥ¼ãĥ³
-0.16
à¹Ģà¸Ľà¸Ńร
-0.15
borg
-0.15
ernals
-0.14
layan
-0.14
andas
-0.14
кÑĢа
-0.14
ansom
-0.13
appa
-0.13
اÙĤØ©
-0.13
POSITIVE LOGITS
consequences
0.18
AVOR
0.17
effects
0.16
-effects
0.15
uries
0.15
zug
0.15
Effects
0.15
effects
0.15
-effect
0.15
Presence
0.14
Activations Density 0.100%