INDEX
Explanations
specific characters or symbols in the text
the end-of-text token
New Auto-Interp
Negative Logits
explan
-0.93
agre
-0.90
chnology
-0.83
ende
-0.83
behavi
-0.82
ngth
-0.80
obser
-0.79
horizont
-0.76
viability
-0.75
independ
-0.75
POSITIVE LOGITS
é¾į
0.93
ef
0.91
ãĤ±
0.87
°
0.82
irect
0.82
ļ
0.82
º
0.82
Ĭ
0.80
¤
0.80
Counter
0.79
Activations Density 0.041%