INDEX
Explanations
phrases that repeat or describe recurring actions
phrases related to repetition or reiteration
New Auto-Interp
Negative Logits
minster
-0.74
Brave
-0.73
ardless
-0.71
behind
-0.67
vana
-0.66
uala
-0.65
Squadron
-0.62
Bu
-0.61
nas
-0.61
emouth
-0.59
POSITIVE LOGITS
hang
0.87
hump
0.85
ealous
0.83
whelming
0.81
fence
0.74
arching
0.72
weekend
0.71
whel
0.70
drive
0.70
hundred
0.68
Activations Density 0.099%