INDEX
Explanations
apologies and expressions of regret
expressions of apology and regret
New Auto-Interp
Negative Logits
mosqu
-0.85
population
-0.79
unin
-0.75
monop
-0.73
density
-0.70
rones
-0.70
Locations
-0.69
Sear
-0.69
nergy
-0.68
Residents
-0.68
POSITIVE LOGITS
apology
2.06
apologise
2.05
apologized
2.00
apologised
1.99
apologize
1.94
apologies
1.90
regrets
1.85
remorse
1.81
apologizing
1.79
regret
1.78
Activations Density 0.631%