INDEX
Explanations
words related to safety or potential danger
words related to safety and danger
New Auto-Interp
Negative Logits
sis
-0.63
occasional
-0.61
essays
-0.61
storms
-0.61
elusive
-0.61
Winged
-0.58
Honour
-0.57
Kinnikuman
-0.57
ethn
-0.57
sails
-0.56
POSITIVE LOGITS
afe
1.21
terness
0.94
becue
0.91
zzle
0.87
ctuary
0.86
afa
0.85
ffe
0.84
eteria
0.83
cakes
0.83
yip
0.79
Activations Density 0.004%