INDEX
Explanations
adjectives and verbs related to expressing opinions or attitudes
terms related to strictness and transparency in decision-making
New Auto-Interp
Negative Logits
ioxide
-0.74
brance
-0.68
aleb
-0.66
anyon
-0.65
vanishing
-0.64
ruction
-0.64
McDonnell
-0.63
Ranked
-0.61
tein
-0.61
ogg
-0.60
POSITIVE LOGITS
enough
0.81
ãĥ¼ãĤ¯
0.76
enough
0.74
ceptive
0.72
reacting
0.71
looking
0.71
ergic
0.70
minded
0.70
atively
0.70
towards
0.68
Activations Density 0.205%