INDEX
Explanations
phrases related to societal issues and policies
New Auto-Interp
Negative Logits
ãĥ¥
-0.66
UV
-0.65
Silence
-0.65
Horses
-0.64
¡
-0.64
PER
-0.63
Shack
-0.62
Norton
-0.62
LV
-0.61
GROUND
-0.61
POSITIVE LOGITS
ividual
1.15
istically
1.02
identifiable
0.96
who
0.90
ities
0.90
istic
0.89
hips
0.82
istical
0.82
istics
0.81
composing
0.81
Activations Density 0.023%