INDEX
Explanations
phrases indicating initial perceptions or actions
at first, initially
New Auto-Interp
Negative Logits
Chwiliwch
-0.71
Hentet
-0.70
Życiorys
-0.66
Tembelea
-0.66
AddTagHelper
-0.64
nahilalakip
-0.63
MessageTagHelper
-0.62
ostavi
-0.62
kasarigan
-0.60
RTSC
-0.60
POSITIVE LOGITS
Initially
0.61
Initially
0.57
最初は
0.47
Sometimes
0.43
pikir
0.41
awalnya
0.40
Sometimes
0.39
initially
0.39
Monfieur
0.38
Usually
0.36
Activations Density 0.015%