INDEX
Explanations
data formatted as bullet points
numerical information or specific measures related to processes or procedures
New Auto-Interp
Negative Logits
Wander
-0.78
Thornton
-0.71
ardless
-0.69
atis
-0.66
midway
-0.66
yan
-0.66
neighb
-0.65
sburgh
-0.65
cules
-0.65
ury
-0.64
POSITIVE LOGITS
âĹı
1.10
âĸł
0.97
âľ
0.85
·
0.84
âĺħ
0.84
âĢ¢
0.82
ðĿ
0.80
³³³
0.80
³³³³³³³³
0.78
³³³³
0.76
Activations Density 0.090%