INDEX
Explanations
phrases indicating apologies or regret
words and phrases related to apologies and advocates
New Auto-Interp
Negative Logits
Jiu
-0.80
Vader
-0.74
MID
-0.70
Peb
-0.68
Tunis
-0.65
enegger
-0.65
Dortmund
-0.64
Karn
-0.64
passer
-0.64
Ceres
-0.63
POSITIVE LOGITS
acy
1.52
etics
1.42
etic
1.41
ates
1.35
etically
1.23
acists
1.18
acies
1.15
otes
1.12
ats
1.03
acist
1.02
Activations Density 0.071%