INDEX
Explanations
personal experiences and reflections shared in a narrative form
New Auto-Interp
Negative Logits
States
-0.66
rouse
-0.63
oses
-0.61
apo
-0.59
istically
-0.59
ifles
-0.58
idental
-0.57
erning
-0.55
warts
-0.54
ierrez
-0.54
POSITIVE LOGITS
ĸļ
0.73
awhile
0.67
downhill
0.61
proven
0.60
since
0.58
Gone
0.56
cffffcc
0.55
plenty
0.55
awfully
0.54
ī
0.54
Activations Density 12.516%