INDEX
Explanations
expressions of desire and choice regarding actions or preferences
New Auto-Interp
Negative Logits
-0.50
Zep
-0.45
Frag
-0.44
ím
-0.42
Zelt
-0.41
Count
-0.41
地道
-0.40
trate
-0.40
Working
-0.40
이는
-0.39
POSITIVE LOGITS
desired
1.08
desire
1.06
desire
0.98
desired
0.93
wished
0.86
wish
0.86
desires
0.83
convenient
0.80
IndexPath
0.80
argout
0.78
Activations Density 0.202%