INDEX
Explanations
phrases related to social and political issues
New Auto-Interp
Negative Logits
father
-0.43
Tray
-0.42
Scarlett
-0.42
Riverside
-0.41
Mothers
-0.41
Cron
-0.41
live
-0.41
VG
-0.40
Strawberry
-0.40
Patty
-0.40
POSITIVE LOGITS
initions
0.50
ilk
0.49
rero
0.45
hooting
0.45
.''.
0.43
ember
0.43
rained
0.43
worldly
0.42
egu
0.42
.�
0.41
Activations Density 7.763%