INDEX
Explanations
positive expressions of personal experiences or milestones
New Auto-Interp
Negative Logits
Mug
-0.17
moz
-0.16
526
-0.16
adera
-0.15
527
-0.15
mong
-0.14
ALA
-0.14
mong
-0.14
ssel
-0.14
ADER
-0.13
POSITIVE LOGITS
mat
1.08
Matt
1.07
matt
0.98
matrix
0.97
Mat
0.96
mat
0.95
MAT
0.95
Matt
0.94
Matthew
0.94
matrices
0.94
Activations Density 0.061%