INDEX
Explanations
phrases indicating a similarity or comparison
phrases emphasizing frequency or repetition
New Auto-Interp
Negative Logits
Louie
-0.80
Kirby
-0.59
behav
-0.58
gio
-0.57
mons
-0.57
olor
-0.57
Bard
-0.55
Regions
-0.55
Galile
-0.55
wisely
-0.55
POSITIVE LOGITS
rontal
0.69
aneously
0.68
èª
0.67
ties
0.66
istic
0.66
itals
0.64
RANT
0.64
RS
0.64
osher
0.63
maxwell
0.62
Activations Density 0.135%