INDEX
Explanations
mentions of the word "drug"
references to rugs and rug-related contexts
New Auto-Interp
Negative Logits
Korean
-0.66
Sparkle
-0.64
Proposition
-0.63
Petra
-0.62
Highlands
-0.62
Founders
-0.61
Pebble
-0.61
Enlightenment
-0.59
Dresden
-0.59
Peninsula
-0.59
POSITIVE LOGITS
rug
1.32
uay
1.05
ular
0.91
ules
0.86
uese
0.85
unda
0.85
ulent
0.82
ula
0.81
recy
0.81
hess
0.81
Activations Density 0.006%