INDEX
Explanations
references to wheelchairs
New Auto-Interp
Negative Logits
dül
-0.16
dense
-0.16
ittel
-0.15
yne
-0.15
tle
-0.15
oulos
-0.15
enthal
-0.15
ional
-0.15
olta
-0.14
lope
-0.14
POSITIVE LOGITS
chair
0.43
wright
0.32
bar
0.31
ie
0.30
-chair
0.28
chair
0.28
base
0.28
Chair
0.28
house
0.28
ing
0.27
Activations Density 0.012%