INDEX
Explanations
verbs or adjectives expressing positive reactions or qualities
expressions of surprise or amazement
New Auto-Interp
Negative Logits
Gemini
-0.83
OPLE
-0.70
Plum
-0.70
Luxem
-0.69
Princ
-0.69
Viet
-0.67
onnaissance
-0.65
Prin
-0.63
士
-0.63
[+
-0.63
POSITIVE LOGITS
akens
1.32
aw
1.31
akening
0.97
atche
0.95
keye
0.87
saw
0.86
ashington
0.81
kward
0.81
awk
0.79
dry
0.78
Activations Density 0.003%