INDEX
Explanations
words related to negative or derogatory descriptions of people or actions
terms related to deceitful or manipulative behavior
New Auto-Interp
Negative Logits
neoc
-0.75
lapse
-0.70
Ramadan
-0.66
craving
-0.66
famine
-0.65
theless
-0.64
millennium
-0.64
foremost
-0.63
abundantly
-0.62
inaug
-0.62
POSITIVE LOGITS
cheon
0.95
hett
0.81
udo
0.79
berus
0.77
itzer
0.76
anut
0.75
uli
0.75
anon
0.74
oslav
0.71
arios
0.70
Activations Density 0.079%