INDEX
Explanations
words related to time, particularly years and dates
New Auto-Interp
Negative Logits
anes
-0.17
::-
-0.15
ê¸ī
-0.14
bang
-0.13
BSD
-0.13
Furn
-0.13
Bowling
-0.13
èĤ¥
-0.13
æ©
-0.13
oder
-0.13
POSITIVE LOGITS
worked
0.23
çalÄ±ÅŁ
0.22
lavor
0.21
stud
0.20
trabaj
0.19
worked
0.19
studied
0.19
studi
0.19
studying
0.18
ÑĢабоÑĤ
0.18
Activations Density 0.039%