INDEX
Explanations
references to the color red
occurrences of the word "red."
New Auto-Interp
Negative Logits
ILA
-0.84
Lank
-0.80
UGH
-0.79
ETHOD
-0.78
ernel
-0.77
Ö¼
-0.76
agall
-0.76
HAEL
-0.74
llah
-0.72
Technical
-0.71
POSITIVE LOGITS
rawn
1.17
neck
1.12
oubt
1.10
efined
1.08
oub
1.02
headed
1.01
velvet
0.99
iscovered
0.96
iscovery
0.93
iscover
0.91
Activations Density 0.024%