INDEX
Explanations
mention of "Hot" in various contexts
New Auto-Interp
Negative Logits
theless
-0.88
acknow
-0.80
conclud
-0.76
ufact
-0.74
ĸļ
-0.72
ajor
-0.71
xual
-0.71
Downloadha
-0.69
guiActiveUn
-0.69
confir
-0.69
POSITIVE LOGITS
Spot
1.07
ness
1.05
shots
0.96
dog
0.96
fixes
0.95
fix
0.94
dogs
0.93
keys
0.93
bed
0.90
water
0.90
Activations Density 0.009%