INDEX
Explanations
references to "Our" or possessive pronouns indicating belonging or connection
New Auto-Interp
Negative Logits
ãĥ³ãĥĹ
-0.15
vell
-0.15
399
-0.15
ãĥ¥ãĥ¼
-0.15
elly
-0.15
antes
-0.15
annt
-0.14
ampler
-0.14
/device
-0.14
antas
-0.14
POSITIVE LOGITS
Light
0.18
Lite
0.17
light
0.17
_light
0.17
át
0.15
Light
0.15
lights
0.15
éħ
0.15
LIGHT
0.15
åIJ
0.15
Activations Density 0.047%