INDEX
Explanations
phrases indicating temporal context or reference to past events
New Auto-Interp
Negative Logits
exitRule
-0.48
wasteful
-0.42
PreferredItem
-0.41
rusted
-0.41
orgull
-0.39
Waste
-0.39
Autoritní
-0.39
Să
-0.39
stupid
-0.39
Kesimpulan
-0.38
POSITIVE LOGITS
kasarigan
0.47
>=",
0.46
TestingModule
0.46
year
0.45
ⓧ
0.45
TagMode
0.44
"..\..\..\
0.42
новниш
0.42
TokenNameEQUAL
0.42
Dispatchers
0.41
Activations Density 0.209%