INDEX
Explanations
phrases related to social issues and societal structures
New Auto-Interp
Negative Logits
ob
-0.79
ILY
-0.73
kamp
-0.71
onis
-0.67
opoly
-0.66
OB
-0.66
ointment
-0.66
achus
-0.65
osate
-0.65
verend
-0.63
POSITIVE LOGITS
pesky
1.04
wishing
0.98
who
0.93
kinds
0.84
who
0.80
interested
0.79
wanting
0.75
unfamiliar
0.75
favoring
0.73
fateful
0.72
Activations Density 0.282%