INDEX
Explanations
words related to decision-making or judgment
nouns and concepts related to responsibilities, dilemmas, and challenges
New Auto-Interp
Negative Logits
poon
-0.72
vez
-0.65
Eva
-0.64
otonin
-0.63
cill
-0.59
vic
-0.58
urses
-0.58
cot
-0.58
Prix
-0.57
GBT
-0.57
POSITIVE LOGITS
varies
0.89
remains
0.75
is
0.74
isn
0.73
arises
0.72
differs
0.71
seemed
0.71
hasn
0.70
seems
0.70
implication
0.69
Activations Density 0.347%