INDEX
Explanations
comparisons using the phrase "not as" followed by a subjective quality being examined
comparisons using the word "as."
New Auto-Interp
Negative Logits
Leaks
-0.77
PLA
-0.74
WAYS
-0.72
guiActive
-0.69
é¾
-0.68
DIT
-0.67
raltar
-0.65
sqor
-0.64
çͰ
-0.64
å§«
-0.63
POSITIVE LOGITS
flashy
1.09
glamorous
1.02
drastic
0.99
easily
0.96
egregious
0.96
bad
0.95
pronounced
0.95
readily
0.95
easy
0.95
robust
0.94
Activations Density 0.044%