INDEX
Explanations
phrases related to locations or directions
references to hands and their positions in various contexts
New Auto-Interp
Negative Logits
vell
-0.77
uni
-0.77
urity
-0.74
anamo
-0.70
endez
-0.69
idian
-0.66
aturday
-0.66
qqa
-0.65
burgh
-0.65
ournal
-0.64
POSITIVE LOGITS
lers
1.07
shake
0.90
ling
0.84
%%%%
0.80
erers
0.75
maid
0.73
--------------------------------------------------------
0.72
legate
0.70
ners
0.68
held
0.67
Activations Density 0.009%