INDEX
Explanations
phrases related to human society or the human condition
negative sentiments and references towards humanity
New Auto-Interp
Negative Logits
urations
-0.81
Bey
-0.71
stood
-0.69
FU
-0.69
olic
-0.68
Sym
-0.64
skim
-0.64
abet
-0.62
tein
-0.62
excerpts
-0.62
POSITIVE LOGITS
ankind
0.88
beings
0.86
civilisation
0.77
humanity
0.74
icity
0.73
humankind
0.73
civilization
0.68
itably
0.68
idis
0.68
footprint
0.67
Activations Density 0.023%