INDEX
Explanations
phrases related to policies, guidance, or official positions
New Auto-Interp
Negative Logits
mouth
-0.74
du
-0.72
flush
-0.69
iband
-0.67
zon
-0.67
quer
-0.66
nice
-0.66
eware
-0.65
zip
-0.65
named
-0.65
POSITIVE LOGITS
expectations
0.92
tradition
0.89
regard
0.84
¥µ
0.82
ideals
0.81
traditions
0.79
regards
0.79
what
0.76
prevailing
0.75
ours
0.75
Activations Density 0.075%