INDEX
Explanations
words related to excessive or extreme characteristics, potentially with a negative connotation
terms related to socio-economic concepts and categories
New Auto-Interp
Negative Logits
Metatron
-0.76
Lub
-0.73
showc
-0.71
TEXT
-0.68
HAHA
-0.64
Cong
-0.63
Ack
-0.63
outbreak
-0.61
ebin
-0.61
EStream
-0.61
POSITIVE LOGITS
cling
0.94
eteen
0.93
phrine
0.87
atinum
0.86
apore
0.86
ice
0.85
vernment
0.84
age
0.83
eln
0.82
iety
0.80
Activations Density 0.031%