INDEX
Explanations
words related to control and authority
terms related to control and authority
New Auto-Interp
Negative Logits
partName
-0.64
ggles
-0.64
pires
-0.58
»Ĵ
-0.56
urry
-0.56
pione
-0.56
azes
-0.54
guyen
-0.54
whisper
-0.53
minist
-0.53
POSITIVE LOGITS
because
0.96
whereas
0.89
because
0.83
.
0.78
despite
0.76
".
0.73
"â̦
0.71
unfairly
0.71
although
0.71
regardless
0.71
Activations Density 1.340%