INDEX
Explanations
words related to authority, control, and decision-making
terms related to scrutiny, investigation, and pressure associated with authority or oversight
New Auto-Interp
Negative Logits
behind
-0.73
ubi
-0.71
arcity
-0.66
ndra
-0.65
ffen
-0.64
ocre
-0.61
vana
-0.61
eport
-0.60
dwarves
-0.60
pell
-0.59
POSITIVE LOGITS
guise
0.99
ausp
0.96
microscope
0.80
assumption
0.77
ãģĨ
0.76
é¾įå¥ij士
0.75
pretext
0.74
premise
0.70
ĺħ
0.70
izon
0.68
Activations Density 0.184%