INDEX
Explanations
references to various practices or behaviors
mentions of the word "practice."
New Auto-Interp
Negative Logits
onge
-0.76
gin
-0.74
panc
-0.72
aman
-0.72
aughter
-0.67
oor
-0.66
aline
-0.65
inki
-0.64
eele
-0.64
arger
-0.64
POSITIVE LOGITS
practice
1.02
Practices
0.94
Practice
0.91
practiced
0.88
practices
0.86
practise
0.86
practicing
0.85
pract
0.82
practice
0.81
practitioners
0.77
Activations Density 0.022%