INDEX
Explanations
descriptions of things that are perceived as unexciting or lacking in interest
New Auto-Interp
Negative Logits
lix
-0.16
Alleg
-0.16
anga
-0.15
atis
-0.15
blindness
-0.14
Hum
-0.14
alink
-0.14
Angeles
-0.14
Tick
-0.14
936
-0.14
POSITIVE LOGITS
Jenner
0.16
ayne
0.15
abyrin
0.15
ỳ
0.15
ecess
0.14
kop
0.14
ãĤ¶ãĥ¼
0.14
íĸ¥
0.14
溪
0.14
.twitch
0.14
Activations Density 0.005%