INDEX
Explanations
adjectives and nouns related to loyalty and faithfulness
concepts related to loyalty and faithfulness
New Auto-Interp
Negative Logits
Drugs
-0.75
Surgery
-0.63
OUT
-0.62
Drug
-0.62
OPA
-0.60
phrine
-0.59
viruses
-0.59
çīĪ
-0.59
vetoed
-0.59
adish
-0.59
POSITIVE LOGITS
glers
0.97
itiz
0.94
iciary
0.86
ists
0.85
ist
0.82
actor
0.81
enough
0.80
servant
0.79
ados
0.79
ettes
0.79
Activations Density 0.026%