INDEX
Explanations
words related to reflection or mirroring
references to mirrors and mirror imagery
New Auto-Interp
Negative Logits
--------------------------------------------------------
-0.80
CVE
-0.74
ensable
-0.74
stant
-0.71
ndra
-0.70
ties
-0.69
estern
-0.68
iott
-0.67
iencies
-0.67
tions
-0.66
POSITIVE LOGITS
ror
1.02
neuron
1.00
ocular
0.96
image
0.95
mirror
0.88
angelo
0.87
reflection
0.87
image
0.86
Mirror
0.86
images
0.85
Activations Density 0.033%