INDEX
Explanations
words related to transportation and locations, potentially focusing on negative instances
negative phrases related to various forms of loss or deprivation
New Auto-Interp
Negative Logits
ulhu
-0.62
Dickinson
-0.55
AVG
-0.52
EntityItem
-0.52
Burr
-0.51
Norris
-0.50
Richards
-0.49
Meadows
-0.49
Borders
-0.48
Rica
-0.47
POSITIVE LOGITS
sized
1.03
based
0.95
level
0.92
shaped
0.88
themed
0.88
grade
0.86
eyed
0.86
related
0.85
derived
0.84
powered
0.84
Activations Density 0.319%