INDEX
Explanations
adjectives related to being twisted or distorted
instances of the words "twisted," "warped," and related terms indicating distortion
New Auto-Interp
Negative Logits
alty
-0.86
abet
-0.85
ciation
-0.84
¯¯¯¯
-0.76
ILA
-0.74
worthiness
-0.73
cial
-0.72
igraph
-0.71
cially
-0.71
gat
-0.70
POSITIVE LOGITS
twisted
1.05
twist
0.94
twisting
0.93
twists
0.81
Twisted
0.77
adolesc
0.76
Hollow
0.74
lengths
0.73
imagin
0.70
intertw
0.69
Activations Density 0.020%