INDEX
Explanations
the word "Att" at different activation levels
mentions of "Att" or related terms typically associated with attention or attachment
New Auto-Interp
Negative Logits
theless
-0.90
å§«
-0.85
enegger
-0.84
FTWARE
-0.83
assetsadobe
-0.78
ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
-0.78
ktop
-0.78
76561
-0.77
Giuliani
-0.76
REDACTED
-0.75
POSITIVE LOGITS
anooga
1.18
itudes
1.17
ributed
1.16
ributes
1.13
ribute
1.12
ainment
1.11
achment
1.09
itude
1.08
ension
1.07
ention
1.03
Activations Density 0.006%