INDEX
Explanations
words related to distraction or shifts in focus
specific adjectives and descriptors related to color and intensity
New Auto-Interp
Negative Logits
76561
-0.77
Akin
-0.77
commemorate
-0.65
rium
-0.64
Gandhi
-0.64
Grassley
-0.63
0001
-0.63
Fav
-0.63
whoever
-0.62
advertisement
-0.61
POSITIVE LOGITS
ness
0.98
haired
0.93
hearted
0.92
coat
0.91
lining
0.88
colour
0.86
skinned
0.85
NESS
0.82
headed
0.81
packed
0.80
Activations Density 0.160%