INDEX
Explanations
words related to criticism and disagreement
phrases related to moral judgments or ethical considerations
New Auto-Interp
Negative Logits
incorpor
-0.84
synerg
-0.79
targeted
-0.75
tyres
-0.74
pigeon
-0.74
spir
-0.73
cloning
-0.73
polio
-0.72
vulner
-0.71
affili
-0.71
POSITIVE LOGITS
ï¸ı
1.42
âĶĢâĶĢ
1.29
ccording
1.22
âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
1.17
âĶĢâĶĢâĶĢâĶĢ
1.13
Because
1.06
à¼
1.02
Therefore
1.02
ICE
1.00
cffffcc
1.00
Activations Density 0.145%