INDEX
Explanations
phrases referring to something being popular or controversial
the term "hot" used in various contexts
New Auto-Interp
Negative Logits
ufact
-0.95
uther
-0.79
ajor
-0.76
confir
-0.75
ĸļ
-0.73
eca
-0.73
atively
-0.73
ARDIS
-0.71
yss
-0.70
INAL
-0.69
POSITIVE LOGITS
spots
0.92
ness
0.89
ened
0.87
Chili
0.87
hot
0.87
headed
0.86
Spot
0.85
stove
0.84
hotter
0.84
shots
0.83
Activations Density 0.020%