INDEX
Explanations
the word "Mars" with varying activations
mentions of the planet Mars
New Auto-Interp
Negative Logits
talk
-0.77
)</
-0.71
trophies
-0.66
voice
-0.66
purse
-0.65
FN
-0.64
FN
-0.63
intrins
-0.62
oho
-0.62
ND
-0.61
POSITIVE LOGITS
Mars
3.89
Mars
3.42
Martian
2.16
Venus
1.97
Ceres
1.82
Pluto
1.74
rover
1.70
mars
1.68
Jupiter
1.49
Mercury
1.43
Activations Density 0.016%