INDEX
Explanations
words related to self-inflicted actions or conditions
New Auto-Interp
Negative Logits
sea
-0.83
estone
-0.76
endez
-0.71
ugu
-0.68
KEY
-0.68
eday
-0.67
Collider
-0.66
anwhile
-0.64
swick
-0.64
staff
-0.63
POSITIVE LOGITS
itled
0.72
attribution
0.69
essed
0.66
rency
0.65
prophecy
0.65
rating
0.65
destruct
0.64
rifice
0.64
iency
0.63
gratification
0.63
Activations Density 0.054%