INDEX
Explanations
phrases related to safety or protection
New Auto-Interp
Negative Logits
quart
-0.81
lished
-0.76
RESULTS
-0.76
licks
-0.75
avez
-0.74
onde
-0.73
aminer
-0.73
ruary
-0.72
zig
-0.71
gres
-0.71
POSITIVE LOGITS
shield
0.89
Protective
0.85
custody
0.82
protective
0.82
enclosure
0.79
apparatus
0.78
casing
0.78
encl
0.77
coat
0.76
duty
0.76
Activations Density 0.009%