INDEX
Explanations
references to social media interactions and commentary
New Auto-Interp
Negative Logits
}{*}{-0.57
-0.47
Reception
-0.46
ISHI
-0.46
пого
-0.45
erfolgre
-0.44
inspiring
-0.44
attend
-0.43
учета
-0.43
прием
-0.43
POSITIVE LOGITS
Diſ
1.05
Reſ
1.05
Anſ
1.04
purpoſe
1.04
Houſe
1.00
pleaſure
0.99
Inſ
0.98
ſever
0.98
Majefty
0.97
Efq
0.96
Activations Density 0.169%