INDEX
Explanations
references to spatial relationships or positioning
New Auto-Interp
Negative Logits
themſelves
-0.88
Theſe
-0.87
itſelf
-0.83
himſelf
-0.76
Anſ
-0.75
Majefty
-0.72
Chriftian
-0.70
་་
-0.69
Efq
-0.68
kwanza
-0.67
POSITIVE LOGITS
с
0.85
по
0.80
С
0.71
С
0.69
со
0.64
sweet
0.62
za
0.62
s
0.61
sweet
0.61
con
0.59
Activations Density 0.015%