INDEX
Explanations
words or phrases related to damage and injury in various contexts
New Auto-Interp
Negative Logits
cogn
-0.20
hod
-0.14
Fr
-0.14
Rol
-0.14
Talent
-0.14
αι
-0.14
TA
-0.13
ton
-0.13
etr
-0.13
اÛĮÙĩ
-0.13
POSITIVE LOGITS
outu
0.16
reme
0.16
LabelText
0.15
\Bridge
0.15
.scalablytyped
0.15
↵↵
0.14
åľĨ
0.14
anden
0.14
PLUGIN
0.14
_smooth
0.14
Activations Density 0.006%