INDEX
Explanations
words related to psychological or self-related concepts
terms related to self-related conditions and actions
New Auto-Interp
Negative Logits
sea
-0.82
nan
-0.71
ugu
-0.71
anwhile
-0.68
KEY
-0.68
hillary
-0.66
estone
-0.66
fml
-0.66
endez
-0.65
eday
-0.64
POSITIVE LOGITS
itled
0.69
attribution
0.68
rency
0.68
essed
0.67
blame
0.64
gratification
0.63
rating
0.61
prophecy
0.61
exile
0.61
ihilation
0.60
Activations Density 0.057%