INDEX
Explanations
recurring patterns or repetitions in terms, particularly focusing on the term "bro"
New Auto-Interp
Negative Logits
ogene
-0.80
hered
-0.80
disapp
-0.79
Julie
-0.75
Bauer
-0.72
ived
-0.67
adul
-0.65
IFE
-0.65
Ñı
-0.64
enlarg
-0.64
POSITIVE LOGITS
t
1.35
tz
1.19
T
1.18
ts
1.14
Ts
1.13
tty
1.13
TD
1.09
ti
1.07
tis
1.07
TS
1.06
Activations Density 0.109%