INDEX
Explanations
expressions of personal reflections or emotional experiences
New Auto-Interp
Negative Logits
now
-0.18
now
-0.17
ags
-0.16
reds
-0.15
manuel
-0.15
ajar
-0.15
opot
-0.14
Krish
-0.14
onde
-0.14
rema
-0.14
POSITIVE LOGITS
future
0.25
future
0.22
futuro
0.20
бÑĥдÑĥÑī
0.20
Future
0.19
Future
0.18
_future
0.17
Kaynak
0.16
майбÑĥÑĤ
0.16
æľªæĿ¥
0.16
Activations Density 0.003%