INDEX
Explanations
themes related to safety and security, particularly in travel
New Auto-Interp
Negative Logits
Fut
-0.15
apro
-0.15
akis
-0.14
smugg
-0.14
casc
-0.14
öl
-0.14
ÙĬدا
-0.13
_GRE
-0.13
atte
-0.13
ź
-0.13
POSITIVE LOGITS
safety
0.43
Safety
0.41
Safety
0.38
afety
0.33
å®īåħ¨
0.32
security
0.29
safer
0.29
Safe
0.28
safe
0.28
Safe
0.27
Activations Density 0.060%