INDEX
Explanations
mentions or descriptions of things that are physically lightweight or not heavy
instances of the word "light" or its associations
New Auto-Interp
Negative Logits
halla
-0.98
ettings
-0.84
isner
-0.74
utor
-0.73
OPLE
-0.73
apons
-0.72
ij士
-0.72
OUP
-0.70
berus
-0.69
Forbidden
-0.68
POSITIVE LOGITS
bulb
1.21
hearted
1.17
weights
1.15
bul
1.15
ening
1.15
enment
1.10
bulbs
1.09
lights
1.04
heartedly
1.02
ener
0.97
Activations Density 0.027%