INDEX
Explanations
mentions of the platform Twitter
New Auto-Interp
Negative Logits
694
-0.16
orious
-0.15
ãĥ¼ãĥ³
-0.15
ins
-0.15
oodle
-0.14
rgan
-0.14
ellan
-0.14
opoulos
-0.14
itel
-0.14
[sub
-0.13
POSITIVE LOGITS
isor
0.18
uç
0.15
sti
0.15
ati
0.15
izen
0.14
-ci
0.14
öt
0.14
stras
0.14
olina
0.14
visor
0.14
Activations Density 0.008%