INDEX
Explanations
words or prefixes related to something negative or problematic
terms related to unsustainable practices or concepts
New Auto-Interp
Negative Logits
SHIP
-0.72
tsky
-0.71
Dynamics
-0.70
tanks
-0.68
Rams
-0.68
briefs
-0.67
Nanto
-0.67
phrine
-0.67
phases
-0.65
Guardians
-0.64
POSITIVE LOGITS
aved
1.18
olicited
1.17
avour
1.16
killed
1.11
iders
1.10
atisf
1.07
ided
1.05
oci
1.03
ident
1.02
rep
1.01
Activations Density 0.015%