INDEX
Explanations
words related to intense, risky, and life-threatening situations
New Auto-Interp
Negative Logits
iland
-0.73
wagen
-0.69
bial
-0.68
packs
-0.65
awaru
-0.65
Saunders
-0.64
Fram
-0.62
ForgeModLoader
-0.61
Accessory
-0.61
hyde
-0.61
POSITIVE LOGITS
ention
1.03
itial
1.02
ensions
0.93
ellect
0.92
ract
0.92
itially
0.92
ellig
0.90
ension
0.89
ensive
0.88
icult
0.87
Activations Density 0.011%