INDEX
Explanations
words related to justice or justifications
occurrences of the word "just" and its variations, indicating a focus on justifications or fairness
New Auto-Interp
Negative Logits
cous
-0.72
luster
-0.69
uberty
-0.69
yip
-0.68
xual
-0.67
pora
-0.66
Prelude
-0.66
wcs
-0.63
monop
-0.61
yang
-0.61
POSITIVE LOGITS
ifiable
1.40
ifications
1.33
ices
1.12
if
1.01
ifi
0.98
ified
0.97
icia
0.93
ifiers
0.93
ification
0.93
IFIED
0.92
Activations Density 0.087%