INDEX
Explanations
phrases that emphasize safety and security in various contexts
New Auto-Interp
Head Attr Weights
0:0.06
1:0.02
2:0.10
3:0.29
4:0.02
5:0.11
6:0.02
7:0.05
8:0.03
9:0.03
10:0.20
11:0.02
Negative Logits
resentment
-2.11
historian
-2.10
govtrack
-2.07
inspector
-2.02
distortions
-2.01
Inspector
-1.99
azeera
-1.95
genius
-1.94
detail
-1.94
Monstrous
-1.93
POSITIVE LOGITS
indoors
2.17
beverages
2.14
outdoors
2.06
********************************
2.05
cannabis
2.01
safe
2.00
enter
2.00
beverage
2.00
safest
1.95
noon
1.92
Activations Density 0.030%