INDEX
Explanations
phrases related to critique or negative judgment
instances of the word "poor" indicating negative quality or conditions
New Auto-Interp
Negative Logits
ategory
-0.83
CU
-0.78
ULAR
-0.75
inct
-0.73
kefeller
-0.72
thus
-0.70
auer
-0.69
TI
-0.69
cade
-0.69
wcsstore
-0.69
POSITIVE LOGITS
luck
0.88
grades
0.84
luck
0.82
performers
0.75
souls
0.73
quality
0.72
imitation
0.71
dies
0.70
bastard
0.70
ãģį
0.69
Activations Density 0.015%