INDEX
Explanations
comparisons or contrasts
New Auto-Interp
Negative Logits
bilt
-0.78
âĹ¼
-0.75
Nanto
-0.72
soType
-0.70
Warning
-0.69
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
-0.68
CLASSIFIED
-0.68
ãĥĵ
-0.68
ãģ®ç
-0.67
interstitial
-0.67
POSITIVE LOGITS
mere
0.96
merely
0.90
simply
0.90
superficial
0.84
simple
0.79
oneself
0.76
partisans
0.73
brute
0.73
aesthetics
0.72
cosmetic
0.72
Activations Density 0.105%