INDEX
Explanations
words and phrases associated with death, danger, or critical situations
New Auto-Interp
Negative Logits
æŃ»äº¡
-0.16
ee
-0.15
tte
-0.15
@student
-0.15
burgh
-0.15
ارÙĩ
-0.14
edom
-0.14
alom
-0.14
eka
-0.14
života
-0.14
POSITIVE LOGITS
ities
0.23
flaw
0.23
istic
0.21
blow
0.20
consequences
0.19
flaws
0.19
dose
0.19
istically
0.18
ilty
0.18
ogy
0.17
Activations Density 0.018%