INDEX
Explanations
words related to things being uncomplicated or straightforward
phrases that describe simplicity or ease of an action or task
New Auto-Interp
Negative Logits
eters
-0.90
mbuds
-0.83
hips
-0.78
reon
-0.74
raints
-0.73
geist
-0.70
akin
-0.70
arians
-0.69
Opera
-0.68
grave
-0.68
POSITIVE LOGITS
prey
0.84
Jet
0.81
going
0.80
easy
0.76
ãĥīãĥ©
0.76
bruising
0.75
wallet
0.74
wired
0.73
forgiving
0.71
enough
0.70
Activations Density 0.025%