INDEX
Explanations
phrases related to being physically outside of a specific location
instances of the word "out."
New Auto-Interp
Negative Logits
gemony
-0.80
ĪĴ
-0.70
mosaic
-0.64
subp
-0.64
eways
-0.62
Genocide
-0.62
erity
-0.61
encies
-0.60
abhor
-0.59
apses
-0.59
POSITIVE LOGITS
fitted
1.26
partying
1.01
doing
1.00
raged
0.96
ranged
0.96
done
0.95
celebrating
0.92
fed
0.90
stretched
0.89
fitting
0.88
Activations Density 0.135%