INDEX
Explanations
references to physical disconnection or amputation
phrases related to cutting off or severing connections, resources, or limbs
New Auto-Interp
Negative Logits
Bench
-0.80
WM
-0.73
antine
-0.72
Rating
-0.70
episode
-0.69
ECH
-0.68
FUL
-0.68
MAT
-0.67
older
-0.65
OD
-0.65
POSITIVE LOGITS
communication
1.06
access
1.02
contact
0.98
limbs
0.89
ties
0.87
valves
0.86
communications
0.86
limb
0.84
supply
0.84
disbelief
0.83
Activations Density 0.067%