INDEX
Explanations
references to cages
references to cages
New Auto-Interp
Negative Logits
Bundes
-0.73
ACTED
-0.69
Nare
-0.68
merce
-0.66
lender
-0.64
ibel
-0.64
IELD
-0.62
isance
-0.62
igate
-0.60
nee
-0.60
POSITIVE LOGITS
cage
0.93
Cage
0.92
cages
0.87
mong
0.87
pit
0.85
washer
0.84
ModLoader
0.84
door
0.80
hold
0.77
lov
0.74
Activations Density 0.023%