INDEX
Explanations
phrases related to popularity and frequency among certain groups
New Auto-Interp
Negative Logits
terminated
-0.84
Transcript
-0.77
arations
-0.75
Letter
-0.70
equival
-0.70
gency
-0.70
claimer
-0.70
olon
-0.67
transcript
-0.67
olicy
-0.66
POSITIVE LOGITS
locals
1.01
enthusiasts
0.97
tourists
0.96
collectors
0.94
travelers
0.92
conservatives
0.88
environmentalists
0.88
hunters
0.85
beginners
0.84
Europeans
0.84
Activations Density 0.110%