INDEX
Explanations
references to "house" or similar dwelling-related terms
New Auto-Interp
Negative Logits
zym
-0.18
owa
-0.17
ت
-0.16
ादन
-0.16
housed
-0.16
theless
-0.16
llum
-0.15
thic
-0.15
atically
-0.15
thal
-0.15
POSITIVE LOGITS
wives
0.32
keeping
0.31
wares
0.31
wife
0.31
boat
0.30
guest
0.29
boats
0.26
mates
0.26
holds
0.25
mate
0.25
Activations Density 0.063%