INDEX
Explanations
phrases that indicate comparisons or evaluations of concepts
New Auto-Interp
Negative Logits
exist
-0.19
aran
-0.18
zsche
-0.18
Exist
-0.17
exists
-0.15
existence
-0.15
exists
-0.15
ugo
-0.15
exist
-0.15
Exist
-0.15
POSITIVE LOGITS
addock
0.17
concerned
0.17
edReader
0.16
ł
0.15
YN
0.15
odyn
0.14
bcc
0.14
abus
0.14
ivas
0.14
iffies
0.14
Activations Density 0.043%