INDEX
Explanations
references to chat applications and chatbots
New Auto-Interp
Negative Logits
hower
-0.17
ÏĤ
-0.17
uv
-0.17
olis
-0.17
ahy
-0.16
halb
-0.16
umberland
-0.15
edList
-0.15
chair
-0.15
swire
-0.15
POSITIVE LOGITS
ting
0.35
roulette
0.31
anooga
0.30
rooms
0.28
bots
0.26
room
0.25
bot
0.25
rooms
0.24
ROOM
0.21
-room
0.20
Activations Density 0.019%