INDEX
Explanations
phrases or expressions indicating difficulty or challenges
New Auto-Interp
Negative Logits
yar
-0.15
ÏģÏħ
-0.15
ampus
-0.15
osen
-0.14
unger
-0.14
alı
-0.14
nid
-0.13
iw
-0.13
ainers
-0.13
erk
-0.13
POSITIVE LOGITS
lay
0.47
average
0.41
non
0.39
nov
0.38
casual
0.37
Average
0.36
Lay
0.35
average
0.34
Average
0.34
Non
0.33
Activations Density 0.074%