INDEX
Explanations
words or phrases that indicate some kind of knowledge like "know", "believe", or hints about a plan
thoughts
New Auto-Interp
Negative Logits
think
-1.49
know
-1.44
believe
-1.32
want
-1.23
hope
-1.18
say
-1.18
see
-1.16
wish
-1.15
feel
-1.13
appreciate
-1.13
POSITIVE LOGITS
OGND
0.60
enerbah
0.56
檚
0.53
Cubit
0.53
terem
0.53
serem
0.51
MongoClient
0.50
windowFixed
0.50
Смо
0.49
Save
0.49
Activations Density 4.622%