INDEX
Explanations
phrases indicating potential actions or capabilities
New Auto-Interp
Negative Logits
chi̍t
-0.54
unknownFields
-0.53
buttonShape
-0.52
aimerais
-0.52
Rüyada
-0.49
hoenix
-0.48
posedge
-0.48
好きです
-0.47
大好きです
-0.47
bahkan
-0.47
POSITIVE LOGITS
möglichst
0.82
puissiez
0.79
becomes
0.73
can
0.72
بتوان
0.70
easier
0.68
possa
0.68
就不会
0.68
才會
0.65
become
0.64
Activations Density 0.171%