INDEX
Explanations
references to risks or dangers faced by individuals in challenging situations
phrases that indicate a sense of urgency or importance
New Auto-Interp
Negative Logits
Edit
-0.79
bots
-0.72
Rules
-0.70
words
-0.69
anism
-0.67
letters
-0.66
tests
-0.66
items
-0.65
lest
-0.64
coins
-0.63
POSITIVE LOGITS
lot
1.25
considerable
1.12
plethora
1.11
tremendous
1.10
bunch
1.06
huge
1.05
significant
1.05
substantial
1.01
sizeable
0.99
glimpse
0.99
Activations Density 0.535%