INDEX
Explanations
adjectives related to quality or performance
terms related to mixed outcomes and varying quality
New Auto-Interp
Negative Logits
dfx
-0.73
Encyclopedia
-0.72
smashed
-0.61
Ships
-0.61
RL
-0.61
oin
-0.61
enium
-0.61
Xan
-0.61
OM
-0.61
Moons
-0.59
POSITIVE LOGITS
distingu
0.72
ikawa
0.70
conserv
0.69
6666
0.67
luster
0.67
dism
0.67
abwe
0.65
conduc
0.65
fared
0.65
Compar
0.65
Activations Density 0.485%