INDEX
Explanations
user-related terms or strings in the text
references to user identifiers or mentions of users
New Auto-Interp
Negative Logits
Baptist
-0.75
Lutheran
-0.69
amer
-0.65
forth
-0.63
Hurricanes
-0.63
hovah
-0.63
Maid
-0.61
erella
-0.59
SourceFile
-0.59
Vaugh
-0.58
POSITIVE LOGITS
interface
1.05
interfaces
1.01
interface
1.00
Interface
0.88
base
0.87
atical
0.85
Interface
0.83
Agent
0.82
pace
0.79
hook
0.77
Activations Density 0.026%