INDEX
Explanations
phrases indicating lack of connection or relevance to a specific topic
New Auto-Interp
Negative Logits
©¶æ¥µ
-0.79
DL
-0.76
psons
-0.72
aido
-0.71
kai
-0.71
Ahead
-0.70
nas
-0.70
hiba
-0.69
uvian
-0.69
-0.69
POSITIVE LOGITS
determining
0.87
upholding
0.86
politics
0.86
criminality
0.84
aesthetics
0.81
legality
0.81
deciding
0.80
ethnicity
0.80
realism
0.80
moderation
0.80
Activations Density 0.047%