INDEX
Explanations
phrases related to falseness or imitation
terms indicating deceptive or false representations
New Auto-Interp
Negative Logits
edin
-0.73
Dynamics
-0.67
sugg
-0.61
ials
-0.60
screenings
-0.60
ells
-0.59
drinks
-0.58
Õ
-0.58
rity
-0.58
reservations
-0.57
POSITIVE LOGITS
judicial
0.84
legal
0.81
pas
0.81
icho
0.78
chal
0.75
cele
0.71
urrection
0.71
pas
0.71
reality
0.71
éĹ
0.69
Activations Density 0.104%