INDEX
Explanations
references to physical or metaphorical openings
New Auto-Interp
Negative Logits
rior
-0.74
CTV
-0.71
VICE
-0.67
grade
-0.66
Bey
-0.64
ãĥīãĥ©ãĤ´ãĥ³
-0.64
orum
-0.63
rag
-0.62
constitu
-0.61
iable
-0.60
POSITIVE LOGITS
doors
1.10
Doors
1.09
portals
0.89
loopholes
0.83
up
0.82
ource
0.81
wounds
0.79
doors
0.77
curtains
0.77
reopen
0.76
Activations Density 0.396%