INDEX
Explanations
terms and phrases related to safety and security in various contexts
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.10
3:0.05
4:0.11
5:0.02
6:0.04
7:0.39
8:0.03
9:0.04
10:0.08
11:0.06
Negative Logits
issance
-1.82
acio
-1.81
cart
-1.67
obsessed
-1.57
nostalg
-1.56
orig
-1.54
Yamato
-1.47
ventures
-1.45
ERN
-1.39
obsession
-1.39
POSITIVE LOGITS
safest
1.74
safe
1.72
unprotected
1.70
secure
1.68
Secure
1.63
bunker
1.59
snakes
1.54
safer
1.54
risk
1.52
Safe
1.52
Activations Density 0.025%