INDEX
Explanations
phrases related to geopolitical relationships and interactions
New Auto-Interp
Negative Logits
WriteTagHelper
-0.86
ftate
-0.82
myſelf
-0.78
Monfieur
-0.78
fubject
-0.76
expandindo
-0.76
perfons
-0.74
pleaſure
-0.72
NSCoder
-0.72
Jefus
-0.71
POSITIVE LOGITS
forgiving
0.65
forgiven
0.57
friendly
0.56
truce
0.56
goodwill
0.53
ändå
0.52
grud
0.51
accept
0.51
concili
0.48
benign
0.48
Activations Density 0.453%