INDEX
Explanations
adjectives related to possible actions or qualities
words related to fulfillment or successful actions
New Auto-Interp
Negative Logits
ernels
-0.77
Downloadha
-0.77
ivari
-0.76
ISC
-0.71
IRO
-0.70
atform
-0.69
ocl
-0.69
ribe
-0.69
æ©
-0.69
CHAT
-0.68
POSITIVE LOGITS
theless
0.96
ignorance
0.81
glances
0.76
NESS
0.76
terday
0.76
idiots
0.75
tarian
0.74
filled
0.73
Appearances
0.71
vous
0.70
Activations Density 0.071%