INDEX
Explanations
words related to challenges to the existing system or status quo
concepts related to societal norms and critiques of status quos
New Auto-Interp
Negative Logits
anmar
-0.64
undai
-0.64
uddenly
-0.63
ETHOD
-0.63
inav
-0.60
Zup
-0.59
Mayhem
-0.58
zbek
-0.57
PROV
-0.57
ometimes
-0.56
POSITIVE LOGITS
of
0.91
iest
0.86
iness
0.80
thereof
0.78
lessness
0.77
quo
0.75
ifice
0.75
iveness
0.72
fallacy
0.72
dimension
0.69
Activations Density 0.379%