INDEX
Explanations
phrases related to self-identification or self-describing entities
New Auto-Interp
Negative Logits
XIII
-0.76
IUM
-0.74
Ashe
-0.71
Rouge
-0.71
rium
-0.68
Syndicate
-0.67
ICAN
-0.67
mingham
-0.66
Nights
-0.66
ī
-0.65
POSITIVE LOGITS
destruct
1.23
lessly
1.09
destruct
1.06
same
1.06
explanatory
1.00
ridges
0.98
contained
0.95
conscious
0.90
proclaimed
0.89
pecially
0.87
Activations Density 0.022%