INDEX
Explanations
references to specific values or amounts
the token representing the end of text
New Auto-Interp
Negative Logits
campaigned
-0.55
Niet
-0.53
tooth
-0.51
Seym
-0.51
vulner
-0.51
precious
-0.50
coupled
-0.50
apart
-0.49
calling
-0.49
nationally
-0.49
POSITIVE LOGITS
ggles
1.25
ilet
1.04
pload
1.01
wered
0.98
pless
0.92
dos
0.90
denote
0.89
ffee
0.88
othy
0.87
asts
0.87
Activations Density 0.051%