INDEX
Explanations
references to social issues and political topics
New Auto-Interp
Negative Logits
mathemat
-0.76
sacrific
-0.72
fortun
-0.68
loopholes
-0.66
scattering
-0.64
elig
-0.64
myster
-0.64
elim
-0.62
jog
-0.61
SERV
-0.61
POSITIVE LOGITS
ï¸ı
1.39
ski
0.89
mental
0.86
tracks
0.86
s
0.84
sure
0.82
ttle
0.81
esc
0.81
ship
0.80
ï¸
0.80
Activations Density 0.772%