INDEX
Explanations
references to glass containers or objects
New Auto-Interp
Negative Logits
stery
-0.17
sts
-0.17
ertools
-0.17
eva
-0.17
memberof
-0.15
akeup
-0.15
tk
-0.15
glass
-0.15
deaux
-0.15
orsi
-0.15
POSITIVE LOGITS
ware
0.33
gow
0.30
y
0.28
ses
0.27
(es
0.26
ed
0.25
ine
0.24
wort
0.24
house
0.23
work
0.23
Activations Density 0.010%