INDEX
Explanations
actions or scenarios involving falling from a height
New Auto-Interp
Negative Logits
urg
-0.72
ihu
-0.67
ounters
-0.67
Dial
-0.65
outhern
-0.64
don
-0.63
HOU
-0.63
ilingual
-0.62
PN
-0.62
çͰ
-0.62
POSITIVE LOGITS
ceiling
1.01
grace
1.00
ceilings
0.90
cliffs
0.90
cliff
0.88
trees
0.85
roof
0.84
window
0.83
balcony
0.83
bushes
0.81
Activations Density 0.121%