INDEX
Explanations
references to the concept of a "shield"
references to protective or defensive mechanisms
New Auto-Interp
Negative Logits
Vide
-0.78
ETA
-0.77
uni
-0.75
ovie
-0.75
pheus
-0.72
PM
-0.71
Helpful
-0.71
orie
-0.70
gres
-0.70
orig
-0.69
POSITIVE LOGITS
shields
1.24
shield
1.22
shielding
0.97
Shield
0.97
Shield
0.93
maid
0.90
shield
0.90
defences
0.85
curtain
0.83
Shields
0.81
Activations Density 0.005%