INDEX
Explanations
years from the 80s and 90s
references to decades, particularly the 1980s and 1990s
New Auto-Interp
Negative Logits
affirmation
-0.70
vain
-0.68
fronts
-0.64
pse
-0.63
justification
-0.61
exha
-0.61
foundations
-0.60
plateau
-0.60
aim
-0.60
sympathetic
-0.60
POSITIVE LOGITS
20439
1.27
70
1.01
40
0.98
00
0.96
71
0.94
ï¸ı
0.93
76
0.93
66
0.92
39
0.92
69
0.92
Activations Density 0.101%