INDEX
Explanations
nouns related to technology and objects
New Auto-Interp
Negative Logits
hopes
-0.69
regrets
-0.69
tains
-0.62
fears
-0.61
discovers
-0.61
grounds
-0.60
tries
-0.60
Grounds
-0.60
believes
-0.60
agrees
-0.59
POSITIVE LOGITS
are
1.17
aren
1.17
ARE
1.04
comprise
0.96
differ
0.93
vary
0.92
are
0.91
weren
0.91
were
0.90
tend
0.90
Activations Density 0.471%