INDEX
Explanations
expression of regret or apology
expressions of apology or regret
New Auto-Interp
Negative Logits
Allied
-0.74
Bourbon
-0.73
Rhodes
-0.72
Tripoli
-0.70
mids
-0.68
RAF
-0.67
civilian
-0.67
JPEG
-0.67
Cincinnati
-0.66
Flickr
-0.66
POSITIVE LOGITS
cause
1.05
ï¸ı
1.03
mean
0.97
âĶĢâĶĢâĶĢâĶĢ
0.93
thing
0.93
shall
0.92
agree
0.92
want
0.92
laughs
0.90
exist
0.90
Activations Density 0.207%