INDEX
Explanations
phrases related to actions or activities that are considered meaningful or impactful
New Auto-Interp
Negative Logits
Norn
-0.92
Norwich
-0.89
McCl
-0.87
Lars
-0.86
Matthews
-0.84
Maiden
-0.80
borough
-0.80
Smy
-0.80
Lind
-0.79
Whale
-0.79
POSITIVE LOGITS
Action
1.32
action
1.30
action
1.29
activ
1.29
ACTION
1.26
Action
1.26
activity
1.24
actions
1.21
Actions
1.19
activity
1.18
Activations Density 0.412%