INDEX
Explanations
the word "ress" followed by a high activation value, particularly "Tress"
instances of the word "press" and its variations
New Auto-Interp
Negative Logits
©¶æ
-0.83
ãĥ£
-0.80
£ı
-0.75
vernment
-0.68
rily
-0.66
subp
-0.65
prus
-0.64
volunte
-0.64
lder
-0.63
hemisphere
-0.62
POSITIVE LOGITS
ively
1.06
ions
1.05
ional
0.99
encer
0.91
ants
0.91
IVE
0.89
entials
0.86
entially
0.86
ives
0.86
mann
0.86
Activations Density 0.018%