INDEX
Explanations
references to safety or security
references to safety and secure spaces
New Auto-Interp
Negative Logits
naire
-0.81
yss
-0.73
issance
-0.73
fred
-0.70
mot
-0.67
elig
-0.65
XIII
-0.62
kay
-0.61
Fiber
-0.60
hiba
-0.59
POSITIVE LOGITS
havens
1.14
keeping
1.09
haven
1.00
haven
0.95
harbor
0.94
inventoryQuantity
0.89
house
0.87
harbour
0.87
Haven
0.84
deposit
0.82
Activations Density 0.039%