INDEX
Explanations
the color red
references to the color red
New Auto-Interp
Negative Logits
ILA
-0.85
ERY
-0.73
Get
-0.72
Math
-0.72
Hung
-0.72
OSH
-0.71
SPONSORED
-0.69
Film
-0.68
Technical
-0.67
renheit
-0.67
POSITIVE LOGITS
rawn
1.25
efined
1.01
neck
1.01
oub
0.95
oubt
0.94
headed
0.91
velvet
0.90
iscovery
0.89
iscovered
0.89
uces
0.88
Activations Density 0.016%