INDEX
Explanations
phrases related to being first or starting something
instances of the word "First."
New Auto-Interp
Negative Logits
spir
-0.71
yne
-0.64
supporting
-0.61
radi
-0.60
toxic
-0.60
trails
-0.59
under
-0.59
sob
-0.59
around
-0.58
wound
-0.58
POSITIVE LOGITS
First
3.50
first
2.41
Firstly
2.41
First
2.36
Firstly
1.99
Fourth
1.87
FIRST
1.81
Second
1.77
Third
1.69
Secondly
1.54
Activations Density 0.011%