INDEX
Explanations
phrases indicating preference for a simpler or more direct approach
occurrences of the word "just."
New Auto-Interp
Negative Logits
challeng
-0.72
Palestin
-0.67
bestos
-0.62
PLUS
-0.62
antage
-0.62
protector
-0.62
Remastered
-0.61
ikuman
-0.61
adversary
-0.61
luster
-0.60
POSITIVE LOGITS
ifiable
1.15
ifications
1.06
if
0.98
ifi
0.95
ified
0.82
ify
0.82
itia
0.80
IFIC
0.79
ifiers
0.78
icia
0.78
Activations Density 0.095%