INDEX
Explanations
words related to images or symbols
New Auto-Interp
Negative Logits
itel
-0.18
i
-0.18
erate
-0.17
iou
-0.16
e
-0.16
al
-0.15
aliz
-0.15
er
-0.15
QA
-0.14
appen
-0.14
POSITIVE LOGITS
othy
0.25
ergic
0.17
çī
0.17
pressions
0.16
jal
0.16
iliki
0.16
PLE
0.16
ENSION
0.15
rod
0.15
yasal
0.15
Activations Density 0.084%