INDEX
Explanations
phrases or sentences referring to a contrasting group or entity labeled as "everyone else."
references to societal perspectives or collective sentiments
New Auto-Interp
Negative Logits
rogram
-0.72
essee
-0.69
itor
-0.66
atre
-0.66
statement
-0.63
ppel
-0.61
umerable
-0.61
forestation
-0.61
rought
-0.61
ukong
-0.59
POSITIVE LOGITS
worldly
0.93
except
0.82
alike
0.73
else
0.67
foss
0.67
nodd
0.66
revolves
0.65
imaginable
0.64
iah
0.64
mattered
0.63
Activations Density 0.034%