INDEX
Explanations
phrases mentioning specific names, likely related to political or social contexts
references to specific individuals, particularly those named "Nad" and "Raf."
New Auto-Interp
Negative Logits
chnology
-0.73
vernment
-0.70
FTWARE
-0.67
fare
-0.67
verage
-0.66
ODUCT
-0.64
Unch
-0.61
ative
-0.60
ãĥĨ
-0.59
redits
-0.59
POSITIVE LOGITS
Nad
1.01
Seym
0.87
inet
0.86
anian
0.82
quet
0.80
vich
0.78
ael
0.77
inished
0.76
ules
0.76
enta
0.75
Activations Density 0.039%