INDEX
Explanations
phrases indicating nearly the same or very similar items, circumstances, or actions
phrases and words emphasizing frequency or recurrence
New Auto-Interp
Negative Logits
Louie
-0.84
Kirby
-0.65
Bard
-0.64
Ki
-0.62
Colors
-0.62
andise
-0.58
gio
-0.57
Blend
-0.57
Jord
-0.57
":["
-0.56
POSITIVE LOGITS
rontal
0.73
lyak
0.72
èª
0.71
osher
0.68
haust
0.66
spoiler
0.65
ï¸
0.65
maxwell
0.64
Ö
0.64
Ïī
0.63
Activations Density 0.188%