INDEX
Explanations
phrases related to ease or simplicity
New Auto-Interp
Negative Logits
grave
-0.87
eters
-0.86
hips
-0.80
emp
-0.73
orp
-0.73
yss
-0.72
raints
-0.71
shall
-0.70
reon
-0.68
inburgh
-0.68
POSITIVE LOGITS
Jet
1.00
prey
0.89
going
0.80
forgiving
0.77
understandable
0.76
easy
0.73
accessible
0.72
stroll
0.71
achievable
0.70
Reply
0.70
Activations Density 0.653%