INDEX
Explanations
words related to moral principles, ethics, and judgement
terms related to sanctity and legality
New Auto-Interp
Negative Logits
Kingdoms
-0.71
UNCH
-0.70
beit
-0.68
Viking
-0.64
curfew
-0.60
GS
-0.60
Sandwich
-0.60
Strikes
-0.60
tongue
-0.59
SPA
-0.59
POSITIVE LOGITS
imon
1.21
ified
1.16
ification
1.07
uitous
1.05
ifying
1.03
imony
1.01
itary
1.00
ific
1.00
eness
0.98
rets
0.93
Activations Density 0.016%