INDEX
Negative Logits
真正的
0.92
Genuine
0.89
단순
0.84
Truly
0.83
genuine
0.83
verdadero
0.82
verdadera
0.82
Genuine
0.80
TRUE
0.80
真正
0.79
POSITIVE LOGITS
perceived
1.05
conformity
1.01
social
0.98
supposedly
0.98
pressured
0.95
appease
0.95
allegedly
0.95
pressure
0.94
narcissistic
0.93
Conform
0.93
Activations Density 0.344%