INDEX
Explanations
phrases related to specific concepts or objects, potentially with negative connotations
metaphorical expressions that imply deception, manipulation, or undesirable outcomes
New Auto-Interp
Negative Logits
URA
-0.79
cont
-0.78
rongh
-0.77
alle
-0.76
ãĤ¼ãĤ¦ãĤ¹
-0.73
ickets
-0.72
qus
-0.71
Rail
-0.71
rab
-0.71
arel
-0.70
POSITIVE LOGITS
mentality
1.04
scenario
0.99
approach
0.98
tactic
0.92
moment
0.86
situation
0.85
solution
0.85
fallacy
0.84
maneuver
0.82
tactics
0.82
Activations Density 0.378%