INDEX
Explanations
phrases indicating affirmation or emphasis
phrases that indicate reasoning or justification for actions and situations
New Auto-Interp
Negative Logits
robe
-0.79
uttering
-0.67
still
-0.65
istically
-0.62
lich
-0.62
ature
-0.62
jj
-0.61
ãĥ¼ãĥ³
-0.61
eer
-0.60
UNE
-0.60
POSITIVE LOGITS
happens
1.13
soever
1.13
happened
1.06
happ
0.94
separates
0.83
transpired
0.77
distinguishes
0.76
atus
0.73
motiv
0.73
Happ
0.73
Activations Density 0.037%